Constrained Maximum Likelihood MT

Constrained Maximum Likelihood MT

Constrained Maximum Likelihood MT provides for the estimation of statistical models by maximum likelihood while allowing for the imposition of general constraints on the parameters, linear or nonlinear, equality or inequality, as well as bounds.

Features

  • Parameter Estimation
  • Statistical Inference – Wald, Bootstrap, Confidence Limits by Inversion
  • Heteroskedastic-consistent Covariance Matrix of Parameters
  • Profile Likelihood
  • Weighted Maximum Likelihood
  • BFGS, DFP, Newton, BHHH descent algorithms
  • Stepbt, Brent, Half Step, Augmented Penalty, BHHH step, Wolfe’s Line Search Methods
  • Numerical Gradient, Hessian

Maximum Likelihood Estimation

Provide a GAUSS procedure for computing the log-likelihood of your statistical model, and Constrained Maximum Likelihood MT does the rest.  Using an iterative Sequential Quadratic Programming method parameter estimates along with standard errors and confidence limits are generated.

Example

Suppose we have a dependent variable that is observed in several ordered categories.  We might estimate coefficients of a regression on this variable using the ordered probit model:

where

and

Assuming we have

where  is the Normal cumulative distribution function.

The log-likelihood function for this model is

The cmlmt function that performs the estimation takes four arguments, (1) a pointer to the procedure that computes the log-likelihood, (2) a PV parameter structure containing the start values of the parameters, (3) a DS data structure, and (4) a control structure.

GAUSS structures are simply bins containing other objects such as matrices, strings, arrays, etc.  They can be defined by the programmer, but the two of the structures used by cmlmt are defined in the Run-Time Library, and the control structure is defined in the cmlmt library.

The PV Parameter Structure

The PV parameter structure is created and filled using GAUSS Run-Time Library functions.  Using these functions the structure can be filled with vectors, matrices, and arrays containing starting parameter values.  Masks can be used to specify fixed versus free parameters.  For example,

struct PV p0;

// creates a default parameter structure
p0 = pvCreate; 

p0 = pvPack(p0,.5|.5|.5,”beta”);
p0 = pvPackm(p0,-30|-1|1|30,”tau”,0|1|1|0);

The structure now contains starting values for a 3×1 vector of coefficients called beta,and another 4×1 vector of thresholds called tau. It will be convenient for the calculation of the log-likelihood for the first and last be parameters set to -30 and +30.  The fourth argument is a mask specifying the first and last elements of the vector to be fixed and the remaining elements free parameters to be estimated.

The DS Data Structure

The DS data structure is a general purpose bucket of GAUSS types.  It contains one of each of the types, matrix, array, string, string array, sparse matrix, and scalar.  It is passed to the log-likelihood procedure untouched by cmlmt.  It can be used by programmers in any way they choose to help in computing the log-likelihood.  Typically it is used to pass data to the procedure.  The DS structure can also be reshaped into a vector of structures giving the programmer great flexibility in handling data and other information.

load x[200,5] = data.csv;

struct DS d0;

// creates a 2x1 vector of
// default data structures
d0 = reshape(dsCreate,2,1);   

// dependent variable
d0[1].dataMatrix = x[.,1];    

// independent variables
d0[2].dataMatrix = x[.,2:4];

The cmlmtControl Structure

This structure handles the matrices and strings that control the estimation process such as setting the descent algorithm, the line search method, and so on.  It is also used to specify the constraints.  For example the thresholds need to be constrained in the following way where .  This implies the two free constraints

The programmer will accomplish by specifing two members of the cmlmtControl structure, C and D, to impose this inequality constraint

where x is the vector of parameters being estimated.  Then

struct cmlmtControl c0;

// creates a default structure
c0 = cmlmtControlCreate; 

c0.C = { 0  0  0 -1  1  0,
         0  0  0  0 -1  1 };

c0.D = { 0,
         0 };

The first three columns of the matrix c0.C and the vector c0.D are associated with regression coefficients that are unconstrained and the last two are associated with the thresholds.

The Log-likelihood Procedure

The programmer now writes a GAUSS procedure computing a vector of log-likelihood probabilities.  This procedure has three input arguments, the PV parameter structure, the DS data structure, and 3×1 vector the first element of which is nonzero if cmlmt is requesting the vector of log-likelihood probabilities, the second element nonzero if it is requesting the matrix of first derivatives with respect to the parameters, and the third element nonzero if it is requesting the Hessian or array of second derivatives with respect to the parameters.  It has one return argument, a modelResults structure containing the results.

proc orderedProbit(struct PV p, struct DS d, ind);
   local mu, tau, beta, emu, eml;
   struct modelResults mm;

   if ind[1] == 1;
      tau = pvUnpack(p,”tau”);
      beta = pvUnpack(p,”beta”);
      mu = d[2].dataMatrix * beta;
      eml = submat(tau,d[1].dataMatrix,0) – mu;
      emu = submat(tau,d[1].dataMatrix + 1,0) – mu;
      mm.function = ln(cdfn(emu) – cdfn(eml));
   endif;
retp(mm);
endp;

The GAUSS submat function serves to pull out the k-th element of tau for the i-th row of d[1].datamatrix set to k.

Since this procedure doesn’t return a matrix of first derivatives nor an array of second derivatives they will be computed numerically by cmlmt.

The result stored in mm.function is an Nx1 vector of log-probabilities computed by observation.  If we were to have provided analytical derivatives, the first derivatives would be an Nxm matrix of derivatives computed by observation where m is the number of parameters to be estimated, and the second derivatives would be an Nxmxm array of second derivatives computed by observation.  Computing these quantities in this way improves accuracy.  It also allows for the BHHH descent method which is more accurate than other methods permitting a larger convergence tolerance.

It is also possible to return a scalar log-likelihood which is the sum of the individual log-probabilities.  In this case the analytical first derivatives would be a 1xm gradient vector, and the second derivatives a 1xmxm array.  You would also need to set c0.numObs to the number of observations since cmlmt is no longer able to determine the number of observations from the length of the vector of log-probabilities.

The Command File

Finally we put it all together in the command file:

library cmlmt;

// contains the structure definitions
#include cmlmt.sdf;   

// simulating data here
x = rndn(200,3);
b = { .4, .5, .6 };
ystar = x*b;
tau = { -50, -1, 0, 1, 50 };

y = (ystar .> tau[1] .and ystar .<= tau[2]) +
   2 * (ystar .> tau[2] .and ystar .<= tau[3]) +
   3 * (ystar .> tau[3] .and ystar .<= tau[4]) +
   4 * (ystar .> tau[4] .and ystar .<= tau[5]);

struct DS d0;

// creates a 2x1 vector of
// default data structures
d0 = reshape(dsCreate,2,1);   

// dependent variable
d0[1].dataMatrix = y;  

// independent variables
d0[2].dataMatrix = x;  

struct PV p0;

// creates a default parameter structure
p0 = pvCreate; 

p0 = pvPack(p0,.5|.5|.5,"beta");
p0 = pvPackm(p0,-30|-1|0|1|30,"tau",0|1|1|1|0);

struct cmlmtControl c0;

// creates a default structure
c0 = cmlmtControlCreate;  

c0.C = { 0  0  0 -1  1  0,
         0  0  0  0 -1  1 };

c0.D = { 0,
         0};

struct cmlmtResults out;

out = cmlmt(&orderedProbit,p0,d0,c0);

// prints the results
call cmlmtprt(out); 

proc orderedProbit(struct PV p, struct DS d, ind);
   local mu, tau, beta, emu, eml;
   struct modelResults mm;

   if ind[1] == 1;
      tau = pvUnpack(p,"tau");
      beta = pvUnpack(p,"beta");
      mu = d[2].dataMatrix * beta;
      eml = submat(tau,d[1].dataMatrix,0) - mu;
      emu = submat(tau,d[1].dataMatrix + 1,0) - mu;
      mm.function = ln(cdfn(emu) - cdfn(eml));
   endif;

retp(mm);
endp;

This program produces the following output:

================================================================

CMLMT Version 2.0.7         3/30/2012   1:29 pm

=================================================================

return code =    0

normal convergence

Log-likelihood        -15.1549

Number of cases     200

Covariance of the parameters computed by the following method:

ML covariance matrix

Parameters    Estimates     Std. err.  Est./s.e.  Prob.    Gradient

------------------------------------------------------------------

beta[1,1]    4.2060        0.2385      17.634   0.0000      0.0000

beta[2,1]    5.3543        0.2947      18.166   0.0000      0.0000

beta[3,1]    6.2839        0.2789      22.531   0.0000      0.0000

tau[2,1]   -10.7561        0.7437     -14.462   0.0000      0.0000

tau[3,1]    -0.0913        0.2499      -0.365   0.7148      0.0000

tau[4,1]    10.6693        0.5697      18.727   0.0000      0.0000

Correlation matrix of the parameters

1     0.52064502     0.54690534    -0.46731768    0.046211496    0.57202935

0.52064502     1     0.58363048    -0.47574225   -0.061765839    0.65959766

0.54690534     0.58363048     1    -0.5169026    -0.0059238287   0.69077806

-0.46731768   -0.47574225    -0.5169026      1    0.0046253798  -0.44858539

0.046211496  -0.061765839   -0.0059238287   0.0046253798    1  -0.01457591

0.57202935    0.65959766     0.69077806    -0.44858539     -0.01457591   1

Wald Confidence Limits

0.95 confidence limits

Parameters    Estimates     Lower Limit   Upper Limit   Gradient

----------------------------------------------------------------------

beta[1,1]    4.2060        3.7355        4.6764        0.0000

beta[2,1]    5.3543        4.7730        5.9356        0.0000

beta[3,1]    6.2839        5.7338        6.8339        0.0000

tau[2,1]   -10.7561      -12.2230       -9.2893        0.0000

tau[3,1]    -0.0913       -0.5842        0.4015        0.0000

tau[4,1]    10.6693        9.5457       11.7929        0.0000

Number of iterations    135

Minutes to convergence     0.00395