## Introduction

Maximum likelihood is a fundamental workhorse for estimating model parameters with applications ranging from simple linear regression to advanced discrete choice models. Today we examine how to implement this technique in GAUSS using the Maximum Likelihood MT library.

Maximum Likelihood MT provides a number of useful, pre-built tools that we will demonstrate today using a simple linear model. We'll show all the fundamentals you need to get started with maximum likelihood estimation in GAUSS including:

- How to create a likelihood function.
- How to call the
`maxlikmt`

procedure to estimate parameters. - How to interpret the results from
`maxlikmt`

.

## Maximum Likelihood Estimation in GAUSS

The Maximum Likelihood MT library provides a full suite of tools for easily and efficiently tackling maximum likelihood estimation.

Today we will use the `maxlikmt`

procedure to estimate two unknown parameters, $\hat{\beta}$ and $\hat{\sigma^2}$. The `maxlikmt`

procedure requires three inputs, a pointer to a likelihood function, a vector of starting parameter values, and the response data.

It also accepts any optional inputs needed for the likelihood function and an optional control structure for fine-tuning optimization.

`out = maxlikmt(&lfn, par_start, y [,..., ctl]);`

- &lfn
- A pointer to a procedure that returns either the scalar log-likelihood, a vector of log-likelihoods, or the weighted log-likelihood.
- par_start
- Vector, starting parameter values.
- y
- Vector, the response data.
- ...
- Optional, additional inputs required for the likelihood function. These are passed directly in the order and form provided to the likelihood function.
- ctl
- Structure, an instance of the
`maxlikmtControl`

structure used to control features of optimization including algorithm, bounds, etc.

## Maximum Likelihood Estimation Linear Model Example

Let's start with a simple linear regression example.

In linear regression, we assume that the model residuals are identical and independently normally distributed:

$$\epsilon = y - \hat{\beta}x \sim N(0, \sigma^2)$$

Based on this assumption, the log-likelihood function for the unknown parameter vector, $\theta = \{\beta, \sigma^2\}$, conditional on the observed data, $y$ and $x$ is given by:

$$L(\theta|y, x) = \frac{1}{2}\sum_{i=1}^n \Big[ \ln \sigma^2 + \ln (2\pi) + \frac{(y_i-\hat{\beta}x_i)^2}{\sigma^2} \Big] $$

For today's example, we will simulate 800 observations of linear data using randomly generated $x$ values and true parameter values of $\beta = 1.2$ and $\sigma^2 = 4$.

The code to generate our dataset can be found here.

### The Linear Model Likelihood Procedure

Our first step is to create the procedure to compute the log-likelihood.

The log-likelihood procedure will be called by `maxlikmt`

, so we have to set it up in the way that `maxlikmt`

expects. The inputs to the log-likelihood procedure are:

- parms
- Vector, the current parameter vector.
- y
- Vector, the response data.
- ...
- Optional, additional inputs required for the likelihood function.
- ind
- 3x1 vector created by
`maxlikmt`

, specifying whether the gradient or Hessian should be computed. Your procedure can ignore this input if you choose to have`maxlikmt`

compute a numerical gradient and / or Hessian.

The log-likelihood procedure must return a `modelResults`

structure with the `function`

member containing the value of log-likelihood at the current parameter values.

```
proc (1) = lfn(theta, y, x, ind);
local beta_est, sigma2;
// Extract parameters
beta_est = theta[1];
sigma2 = theta[2];
// Declare the modelResults structure
struct modelResults mm;
// Set the modelResults structure member, 'function',
// equal to the log-likelihood function value
mm.function = -1/2 * (ln(sigma2) + ln(2*pi) + (y - x*beta_est)^2 / sigma2);
retp(mm);
endp;
```

After looking at the above procedure, you are probably wondering:

- Why do we need to return the function value in a
`modelResults`

structure? - What is
`ind`

?

Hold those questions for now. We will answer them in our second example when we add an analytical gradient.

### Calling the `maxlikmt`

procedure

Once the log-likelihood procedure has been created, we are ready to call `maxlikmt`

. We first specify our starting values:

```
// Starting values
theta_start = { 0.5, 0.5 };
```

Finally, we call `maxlikmt`

and print the results using `maxlikmtprt`

:

```
// Perform estimation and print report
call maxlikmtprt(maxlikmt(&lfn, theta_start, y, x));
```

`maxlikmt`

as to the likelihood function.### Maximum Likelihood Estimate Results

GAUSS prints the following results to the input/output window:

return code = 0 normal convergence Log-likelihood -1686.04 Number of cases 800 Covariance of the parameters computed by the following method: ML covariance matrix Parameters Estimates Std. err Est./se Prob Gradient ------------------------------------------------------------------ x[1,1] 1.1486 0.0710 16.177 0.0000 0.0000 x[2,1] 3.9638 0.1982 20.001 0.0000 0.0028

The first thing to note is that GAUSS tells us that the optimization converges normally. If the optimization failed to converge GAUSS would report this along with the reason for failure.

The GAUSS maximum likelihood estimates are $\hat{\beta}_{MLE} = 1.1486$ and $\hat{\sigma^2} = 3.9638$.

In addition to the parameter estimates, GAUSS provides confidence intervals around the estimates:

Wald Confidence Limits 0.95 confidence limits Parameters Estimates Lower Limit Upper Limit Gradient ---------------------------------------------------------------- x[1,1] 1.1486 1.0093 1.2880 0.0000 x[2,1] 3.9638 3.5747 4.3528 0.0028

### Storing Estimation Results in the `maxlikmtResults`

Output Structure

Our first example used the `call`

keyword to discard the return from `maxlikmt`

. You can return the estimation results in a `maxlikmtResults`

structure.

Some of the most useful elements in the `maxlikmtResults`

structure include:

- m_out.par.obj.m
- Vector, estimated parameter values. These are in the same order as the parameters were passed in the parameter starting values input.
- m_out.returnDescription
- String, describes if convergence normally or if there are errors.
- m_out.covPar
- Matrix, covariance of the paramter estimates. This is computed as the Moore-Penrose inverse of the Hessian at the final parameter estimates.

Here is how to modify our earlier code to save the results.

```
// Declare 'm_out' to be a maxlikmtResults structure
struct maxlikmtResults m_out;
// Perform estimation and store results in 'm_out'
m_out = maxlikmt(&lfn, theta_start, y, x);
// Print the parameter estimates
print m_out.par.obj.m;
// Print the return description
print m_out.returnDescription;
```

## Specifying the Analytical Gradient

For our linear model, it is quite feasible to derive the analytical first derivates:

$$\frac{\partial L(\theta||y, x)}{\partial \beta} = \frac{1}{\sigma^2}\Big[x*(y - \beta x)\Big]$$ $$\frac{\partial L(\theta||y, x)}{\partial \sigma^2} = -\frac{1}{2}\Big[\frac{1}{\sigma} - \frac{(y - \beta x)^2}{\sigma^2}\Big]$$

We can specify to GAUSS to use these analytical first derivatives by computing them in our likelihood procedure and assigning the results to the `gradient`

member of the `modelResults`

structure as shown below:

```
// Write likelihood function
// with analytical derivatives
proc (1) = lfn(theta, y, x, ind);
local beta_est, sigma2, g1, g2;
beta_est = theta[1];
sigma2 = theta[2];
struct modelResults mm;
// Specify likelihood function
if ind[1];
mm.function = -1/2*(ln(sigma2) + ln(2*pi) + (y - x*beta_est)^2 / sigma2);
endif;
// Include gradients
if ind[2];
g1 = 1/sigma2 * ((y - x*beta_est) .*x);
g2 = -1/2 * ((1/sigma2) - (y - x*beta_est)^2 / sigma2^2);
// Concatenate into a (n observations)x2 matrix.
mm.gradient = g1 ~ g2;
endif;
retp(mm);
endp;
```

## What is `ind`

?

`ind`

is a 3x1 vector created by `maxlikmt`

which tells your likelihood procedure whether it needs a gradient or Hessian calculation in addition to the function evaluation. As we can see in the above procedure, if the second element of `ind`

is nonzero, `maxlikmt`

is asking for a gradient calculation.

Note that:

- You do not have to use or check for
`ind`

in your likelihood procedure if you are using numerical derivatives. `maxlikmt`

creates`ind`

internally and passes it to the likelihood procedure. You do not have to declare or create it.

### How `ind`

can speed up your modeling

The advantage of using the `ind`

input rather than a separate gradient or Hessian procedure is that often times many calculations needed by the function evaluation are also needed by the gradient. Combining them all into one procedure gives an opportunity to only compute the operations once.

For example, if we needed to speed up our model, we could modify our likelihood procedure to compute the residuals just once towards the top of the code and use the newly created `residuals`

and `residuals2`

variables in the function and gradient calculations.

```
proc (1) = lfn(theta, y, x, ind);
local beta_est, sigma2, g1, g2, residuals, residuals2;
beta_est = theta[1];
sigma2 = theta[2];
// Operations common to likelihood and gradient
residuals = y - x*beta_est;
residuals2 = residuals^2;
struct modelResults mm;
// Specify likelihood function
if ind[1];
mm.function = -1/2*(ln(sigma2) + ln(2*pi) + residuals2 / sigma2);
endif;
// Include gradients
if ind[2];
g1 = 1/sigma2 * (residuals .* x);
g2 = -1/2 * ((1/sigma2) - (residuals2 / sigma2^2);
// Concatenate into a 1x2 row vector.
mm.gradient = g1 ~ g2;
endif;
retp(mm);
endp;
```

## Conclusions

Congratulations! After today's blog, you should have a better understanding of how to implement maximum likelihood in GAUSS.

We've covered the components of the `maxlikmt`

procedure including:

- The
`maxlikmt`

inputs. - The log-likelihood procedure and the
`modelResults`

structure. - The
`maxlikmtResults`

output structure. - Specifying analytical gradients in the log-likelihood function.

Eric has been working to build, distribute, and strengthen the GAUSS universe since 2012. He is an economist skilled in data analysis and software development. He has earned a B.A. and MSc in economics and engineering and has over 18 years of combined industry and academic experience in data analysis and research.