Maximum Likelihood Estimation in GAUSS

by Eric · Published October 15, 2020 · Updated October 16, 2020

Introduction

Maximum likelihood is a fundamental workhorse for estimating model parameters with applications ranging from simple linear regression to advanced discrete choice models. Today we examine how to implement this technique in GAUSS using the Maximum Likelihood MT library.

Maximum Likelihood MT provides a number of useful, pre-built tools that we will demonstrate today using a simple linear model. We'll show all the fundamentals you need to get started with maximum likelihood estimation in GAUSS including:

How to create a likelihood function.
How to call the maxlikmt procedure to estimate parameters.
How to interpret the results from maxlikmt.

Note: This builds on our "Beginner's Guide to Maximum Likelihood Estimation".

Maximum Likelihood Estimation in GAUSS

The Maximum Likelihood MT library provides a full suite of tools for easily and efficiently tackling maximum likelihood estimation.

Today we will use the maxlikmt procedure to estimate two unknown parameters, $\hat{\beta}$ and $\hat{\sigma^2}$. The maxlikmt procedure requires three inputs, a pointer to a likelihood function, a vector of starting parameter values, and the response data.

It also accepts any optional inputs needed for the likelihood function and an optional control structure for fine-tuning optimization.

out = maxlikmt(&lfn, par_start, y [,..., ctl]);

&lfn: A pointer to a procedure that returns either the scalar log-likelihood, a vector of log-likelihoods, or the weighted log-likelihood.
par_start: Vector, starting parameter values.
y: Vector, the response data.
...: Optional, additional inputs required for the likelihood function. These are passed directly in the order and form provided to the likelihood function.
ctl: Structure, an instance of the maxlikmtControl structure used to control features of optimization including algorithm, bounds, etc.

You can create a pointer to a procedure by prepending the name of the procedure with an ampersand (&).

Maximum Likelihood Estimation Linear Model Example

Let's start with a simple linear regression example.

In linear regression, we assume that the model residuals are identical and independently normally distributed:

$$\epsilon = y - \hat{\beta}x \sim N(0, \sigma^2)$$

Based on this assumption, the log-likelihood function for the unknown parameter vector, $\theta = \{\beta, \sigma^2\}$, conditional on the observed data, $y$ and $x$ is given by:

$$L(\theta|y, x) = \frac{1}{2}\sum_{i=1}^n \Big[ \ln \sigma^2 + \ln (2\pi) + \frac{(y_i-\hat{\beta}x_i)^2}{\sigma^2} \Big] $$

For today's example, we will simulate 800 observations of linear data using randomly generated $x$ values and true parameter values of $\beta = 1.2$ and $\sigma^2 = 4$.

The code to generate our dataset can be found here.

The Linear Model Likelihood Procedure

Our first step is to create the procedure to compute the log-likelihood.

The log-likelihood procedure will be called by maxlikmt, so we have to set it up in the way that maxlikmt expects. The inputs to the log-likelihood procedure are:

parms: Vector, the current parameter vector.
y: Vector, the response data.
...: Optional, additional inputs required for the likelihood function.
ind: 3x1 vector created by maxlikmt, specifying whether the gradient or Hessian should be computed. Your procedure can ignore this input if you choose to have maxlikmt compute a numerical gradient and / or Hessian.

The log-likelihood procedure must return a modelResults structure with the function member containing the value of log-likelihood at the current parameter values.

The sum is omitted intentionally to compute a vector of log-likelihoods by observation.

proc (1) = lfn(theta, y, x, ind);
    local beta_est, sigma2;

    // Extract parameters
    beta_est = theta[1];
    sigma2 = theta[2];

    // Declare the modelResults structure
    struct modelResults mm;

    // Set the modelResults structure member, 'function',
    // equal to the log-likelihood function value
    mm.function = -1/2 * (ln(sigma2) + ln(2*pi) + (y - x*beta_est)^2 / sigma2);

    retp(mm);
endp;

After looking at the above procedure, you are probably wondering:

Why do we need to return the function value in a modelResults structure?
What is ind?

Hold those questions for now. We will answer them in our second example when we add an analytical gradient.

Calling the `maxlikmt` procedure

Once the log-likelihood procedure has been created, we are ready to call maxlikmt. We first specify our starting values:

// Starting values
theta_start = { 0.5, 0.5 };

Finally, we call maxlikmt and print the results using maxlikmtprt:

// Perform estimation and print report
call maxlikmtprt(maxlikmt(&lfn, theta_start, y, x));

Note that the data is passed in the same order to maxlikmt as to the likelihood function.

Maximum Likelihood Estimate Results

GAUSS prints the following results to the input/output window:

return code =    0
normal convergence

Log-likelihood    -1686.04
Number of cases     800

Covariance of the parameters computed by the following method:
ML covariance matrix

Parameters    Estimates     Std. err   Est./se     Prob   Gradient
------------------------------------------------------------------
x[1,1]           1.1486       0.0710    16.177   0.0000     0.0000
x[2,1]           3.9638       0.1982    20.001   0.0000     0.0028

The first thing to note is that GAUSS tells us that the optimization converges normally. If the optimization failed to converge GAUSS would report this along with the reason for failure.

The GAUSS maximum likelihood estimates are $\hat{\beta}_{MLE} = 1.1486$ and $\hat{\sigma^2} = 3.9638$.

In addition to the parameter estimates, GAUSS provides confidence intervals around the estimates:

Wald Confidence Limits

                              0.95 confidence limits
Parameters    Estimates     Lower Limit   Upper Limit   Gradient
----------------------------------------------------------------
x[1,1]           1.1486          1.0093        1.2880     0.0000
x[2,1]           3.9638          3.5747        4.3528     0.0028

Storing Estimation Results in the `maxlikmtResults` Output Structure

Our first example used the call keyword to discard the return from maxlikmt. You can return the estimation results in a maxlikmtResults structure.

Some of the most useful elements in the maxlikmtResults structure include:

m_out.par.obj.m: Vector, estimated parameter values. These are in the same order as the parameters were passed in the parameter starting values input.
m_out.returnDescription: String, describes if convergence normally or if there are errors.
m_out.covPar: Matrix, covariance of the paramter estimates. This is computed as the Moore-Penrose inverse of the Hessian at the final parameter estimates.

Here is how to modify our earlier code to save the results.

// Declare 'm_out' to be a maxlikmtResults structure
struct maxlikmtResults m_out;

// Perform estimation and store results in 'm_out'
m_out = maxlikmt(&lfn, theta_start, y, x);

// Print the parameter estimates
print m_out.par.obj.m;

// Print the return description
print m_out.returnDescription;

More information about how to view stored results can be found here.

Specifying the Analytical Gradient

For our linear model, it is quite feasible to derive the analytical first derivates:

$$\frac{\partial L(\theta||y, x)}{\partial \beta} = \frac{1}{\sigma^2}\Big[x*(y - \beta x)\Big]$$ $$\frac{\partial L(\theta||y, x)}{\partial \sigma^2} = -\frac{1}{2}\Big[\frac{1}{\sigma} - \frac{(y - \beta x)^2}{\sigma^2}\Big]$$

We can specify to GAUSS to use these analytical first derivatives by computing them in our likelihood procedure and assigning the results to the gradient member of the modelResults structure as shown below:

// Write likelihood function
// with analytical derivatives

proc (1) = lfn(theta, y, x, ind);
    local beta_est, sigma2, g1, g2;

    beta_est = theta[1];
    sigma2 = theta[2];

    struct modelResults mm;
    // Specify likelihood function
    if ind[1];
        mm.function = -1/2*(ln(sigma2) + ln(2*pi) + (y - x*beta_est)^2 / sigma2);
    endif;

    // Include gradients
    if ind[2];

       g1 = 1/sigma2 * ((y - x*beta_est) .*x);
       g2 = -1/2 * ((1/sigma2) - (y - x*beta_est)^2 / sigma2^2); 

       // Concatenate into a (n observations)x2 matrix. 
       mm.gradient = g1 ~ g2;
    endif;

    retp(mm);
endp;

What is `ind`?

ind is a 3x1 vector created by maxlikmt which tells your likelihood procedure whether it needs a gradient or Hessian calculation in addition to the function evaluation. As we can see in the above procedure, if the second element of ind is nonzero, maxlikmt is asking for a gradient calculation.

Note that:

You do not have to use or check for ind in your likelihood procedure if you are using numerical derivatives.
maxlikmt creates ind internally and passes it to the likelihood procedure. You do not have to declare or create it.

How `ind` can speed up your modeling

The advantage of using the ind input rather than a separate gradient or Hessian procedure is that often times many calculations needed by the function evaluation are also needed by the gradient. Combining them all into one procedure gives an opportunity to only compute the operations once.

For example, if we needed to speed up our model, we could modify our likelihood procedure to compute the residuals just once towards the top of the code and use the newly created residuals and residuals2 variables in the function and gradient calculations.

proc (1) = lfn(theta, y, x, ind);
    local beta_est, sigma2, g1, g2, residuals, residuals2;

    beta_est = theta[1];
    sigma2 = theta[2];

    // Operations common to likelihood and gradient
    residuals = y - x*beta_est;
    residuals2 = residuals^2;

    struct modelResults mm;
    // Specify likelihood function
    if ind[1];
        mm.function = -1/2*(ln(sigma2) + ln(2*pi) + residuals2 / sigma2);
    endif;

    // Include gradients
    if ind[2];

       g1 = 1/sigma2 * (residuals .* x);
       g2 = -1/2 * ((1/sigma2) - (residuals2 / sigma2^2); 

       // Concatenate into a 1x2 row vector. 
       mm.gradient = g1 ~ g2;
    endif;

    retp(mm);
endp;

Conclusions

Congratulations! After today's blog, you should have a better understanding of how to implement maximum likelihood in GAUSS.

We've covered the components of the maxlikmt procedure including:

The maxlikmt inputs.
The log-likelihood procedure and the modelResults structure.
The maxlikmtResults output structure.
Specifying analytical gradients in the log-likelihood function.

Eric( Director of Applications and Training at Aptech Systems, Inc. )

Eric has been working to build, distribute, and strengthen the GAUSS universe since 2012. He is an economist skilled in data analysis and software development. He has earned a B.A. and MSc in economics and engineering and has over 18 years of combined industry and academic experience in data analysis and research.

Maximum Likelihood Estimation in GAUSS

Introduction

Maximum Likelihood Estimation in GAUSS

Maximum Likelihood Estimation Linear Model Example

The Linear Model Likelihood Procedure

Calling the maxlikmt procedure

Maximum Likelihood Estimate Results

Storing Estimation Results in the maxlikmtResults Output Structure

Specifying the Analytical Gradient

What is ind?

How ind can speed up your modeling

Conclusions

Calling the `maxlikmt` procedure

Storing Estimation Results in the `maxlikmtResults` Output Structure

What is `ind`?

How `ind` can speed up your modeling