MLE with Bounded Parameters: A Cleaner Approach

by admin · Published April 8, 2026

Introduction

It's natural in data analysis applications for parameters to have bounds; variances can't be negative, GARCH coefficients must sum to less than one for stationarity, and mixing proportions live between zero and one.

When you estimate these models by maximum likelihood, the optimizer needs to respect those bounds, not just at the solution, but throughout the search. If optimization searches wander into invalid territory, it can impact the reliability and convergence of your results. For example, you may get complex numbers from negative variances, explosive forecasts from non-stationary GARCH, or likelihoods that make no sense.

GAUSS 26.0.1 introduces minimize, the first new GAUSS optimizer in over 10 years, to handle this cleanly.

The minimize optmizer let's you specify bounds directly and GAUSS internally keeps parameters feasible at every iteration. No more log-transforms, no penalty functions, and no doublechecking.

In today's blog, we'll see the new minimize function in action, as we walk through two examples:

A GARCH estimation where variance parameters must be positive
A Stochastic frontier models where both variance components must be positive.

In both cases, bounded optimization makes estimation easier and aligns results with theory.

Why Bounds Matter

To see why this matters in practice, let’s look at a familiar example. Consider a GARCH(1,1) model:

$\sigma^2_t = \omega + \alpha \varepsilon^2_{t-1} + \beta \sigma^2_{t-1}$

For this model to be well-defined and economically meaningful:

The baseline variance must be positive ($\omega \gt 0$)
Shocks and persistence must contribute non-negatively to variance ($\alpha \geq 0$, $\beta \geq 0$)
The model must be stationary ($\alpha + \beta \lt 1$)

The traditional workaround is to estimate transformed parameters, $\log(\omega)$ instead of $\omega$, then convert back. This works, but it distorts the optimization surface and complicates standard error calculations. You're not estimating the parameters you care about; you're estimating transforms and hoping the numerics work out.

With bounded optimization, you estimate $\omega$, $\alpha$, and $\beta$ directly, with the optimizer respecting the constraints throughout.

Example 1: GARCH(1,1) on Commodity Returns

Let's estimate a GARCH(1,1) model on a dataset of 248 observations of commodity price returns (this data is included in the GAUSS 26 examples directory).

Step One: Data and Likelihood

First, we load the data and specify our log-likelihood objective function.

// Load returns data (ships with GAUSS)
fname = getGAUSShome("examples/df_returns.gdat");
returns = loadd(fname, "rcpi");

// GARCH(1,1) negative log-likelihood
proc (1) = garch_negll(theta, y);
    local omega, alpha, beta_, sigma2, ll, t;

    omega = theta[1];
    alpha = theta[2];
    beta_ = theta[3];

    sigma2 = zeros(rows(y), 1);

    // Initialize with sample variance
    sigma2[1] = stdc(y)^2;

    // Variance recursion
    for t (2, rows(y), 1);
        sigma2[t] = omega + alpha * y[t-1]^2 + beta_ * sigma2[t-1];
    endfor;

    // Gaussian log-likelihood
    ll = -0.5 * sumc(ln(2*pi) + ln(sigma2) + (y.^2) ./ sigma2);

    retp(-ll);  // Return negative for minimization
endp;

Step Two: Setting Up Optimization

Now we set up the bounded optimization with:

$\omega \gt 0$ (small positive lower bound to avoid numerical issues)
$\alpha \geq 0$
$\beta \geq 0$

Because minimize handles simple box constraints, we impose individual upper bounds on $\alpha$ and $\beta$ to keep the optimizer in a reasonable region. We'll verify the stationarity condition, $\alpha + \beta \lt 1$ after estimation.

// Starting values
theta0 = { 0.00001,   // omega (small, let data speak)
           0.05,      // alpha
           0.90 };    // beta

// Set up minimize
struct minimizeControl ctl;
ctl = minimizeControlCreate();

// Bounds: all parameters positive, alpha + beta < 1
ctl.bounds = { 1e-10      1,      // omega in [1e-10, 1]
               0          1,      // alpha in [0, 1]
               0     0.9999 };    // beta in [0, 0.9999]

We cap $\beta$ slightly below 1 to avoid numerical issues near the boundary, where the likelihood surface can become flat and unstable.

Step Three: Running the Model

Finally, we call minimize to run our model.

// Estimate
struct minimizeOut out;
out = minimize(&garch_negll, theta0, returns, ctl);

Results and Visualization

After estimation, we'll extract the conditional variance series and confirm the stationarity condition:

// Extract estimates
omega_hat = out.x[1];
alpha_hat = out.x[2];
beta_hat = out.x[3];

print "omega = " omega_hat;
print "alpha = " alpha_hat;
print "beta  = " beta_hat;
print "alpha + beta = " alpha_hat + beta_hat;
print "Iterations: " out.iterations;

Output:

omega = 0.0000070
alpha = 0.380
beta  = 0.588

alpha + beta = 0.968
Iterations: 39

There are a few noteworthy results:

The high persistence ($\alpha + \beta \approx 0.97$) means volatility shocks decay slowly.
The relatively high $\alpha$ (0.38) indicates that recent shocks have substantial immediate impact on variance.
The optimization converged in 39 iterations with all parameters staying inside their bounds throughout. No invalid variance evaluations, no numerical exceptions.

Visualizing the conditional variance alongside the original series provides further insight:

// Compute conditional variance series for plotting
T = rows(returns);
sigma2_hat = zeros(T, 1);
sigma2_hat[1] = stdc(returns)^2;

for t (2, T, 1);
    sigma2_hat[t] = omega_hat + alpha_hat * returns[t-1]^2 + beta_hat * sigma2_hat[t-1];
endfor;

// Plot returns and conditional volatility
struct plotControl plt;
plt = plotGetDefaults("xy");
plotSetTitle(&plt, "GARCH(1,1): Returns and Conditional Volatility");
plotSetYLabel(&plt, "Returns / Volatility");

plotLayout(2, 1, 1);
plotXY(plt, seqa(1, 1, T), returns);

plotLayout(2, 1, 2);
plotSetTitle(&plt, "Conditional Standard Deviation");
plotXY(plt, seqa(1, 1, T), sqrt(sigma2_hat));

The plot shows volatility clustering: periods of high volatility tend to persist, consistent with what we observe in commodity markets.

Example 2: Stochastic Frontier Model

Stochastic frontier analysis separates random noise from systematic inefficiency. It's widely used in productivity analysis to measure how far firms operate below their production frontier.

The model:

$y = X\beta + v - u$

where:

$v \sim N(0, \sigma^2_v)$ — symmetric noise (measurement error, luck)
$u \sim N^+(0, \sigma^2_u)$ — one-sided inefficiency (always reduces output)

Both variance components must be positive. If the optimizer tries $\sigma^2_v \lt 0$ or $\sigma^2_u \lt 0$, the likelihood involves square roots of negative numbers.

Step One: Data and Likelihood

For this example, we'll simulate data from a Cobb-Douglas production function with inefficiency. This keeps the example self-contained and lets you see exactly what's being estimated.

// Simulate production data
rndseed 8675309;
n = 500;

// Inputs (labor, capital, materials)
labor = exp(2 + 0.5*rndn(n, 1));
capital = exp(3 + 0.7*rndn(n, 1));
materials = exp(2.5 + 0.4*rndn(n, 1));

// True parameters
beta_true = { 1.5,    // constant
              0.4,    // labor elasticity
              0.3,    // capital elasticity
              0.25 }; // materials elasticity
sig2_v_true = 0.02;   // noise variance
sig2_u_true = 0.08;   // inefficiency variance

// Generate output with noise (v) and inefficiency (u)
v = sqrt(sig2_v_true) * rndn(n, 1);
u = sqrt(sig2_u_true) * abs(rndn(n, 1));  // half-normal

X = ones(n, 1) ~ ln(labor) ~ ln(capital) ~ ln(materials);
y = X * beta_true + v - u;  // inefficiency reduces output

After simulating our data, we specify the log-likelihood function for minimization:

// Stochastic frontier log-likelihood (half-normal inefficiency)
proc (1) = sf_negll(theta, y, X);
    local k, beta_, sig2_v, sig2_u, sigma, lambda;
    local eps, z, ll;

    k = cols(X);
    beta_ = theta[1:k];
    sig2_v = theta[k+1];
    sig2_u = theta[k+2];

    sigma = sqrt(sig2_v + sig2_u);
    lambda = sqrt(sig2_u / sig2_v);

    eps = y - X * beta_;
    z = -eps * lambda / sigma;

    ll = -0.5*ln(2*pi) + ln(2) - ln(sigma)
         - 0.5*(eps./sigma).^2 + ln(cdfn(z));

    retp(-sumc(ll));
endp;

Step Two: Setting Up Optimization

As we did in our previous example, we begin with our starting values. For this model, we run OLS and use the residual variance as starting values:

// OLS for starting values
beta_ols = invpd(X'X) * X'y;
resid = y - X * beta_ols;
sig2_ols = meanc(resid.^2);

// Starting values: Split residual variance 
// between noise and inefficiency
theta0 = beta_ols | (0.5 * sig2_ols) | (0.5 * sig2_ols);

We leave our coefficients unbounded but constrain the variances to be positive:

// Bounds: coefficients unbounded, variances positive
k = cols(X);
struct minimizeControl ctl;
ctl = minimizeControlCreate();
ctl.bounds = (-1e300 * ones(k, 1) | 0.001 | 0.001) ~ (1e300 * ones(k+2, 1));

Step Three: Running the Model

Finally, we call minimize to estimate our model:

// Estimate
struct minimizeOut out;
out = minimize(&sf_negll, theta0, y, X, ctl);

Results and Visualization

Now that we've estimated our model, let's examine our results.

// Extract estimates
k = cols(X);
beta_hat = out.x[1:k];
sig2_v_hat = out.x[k+1];
sig2_u_hat = out.x[k+2];

print "Coefficients:";
print "  constant     = " beta_hat[1];
print "  ln(labor)    = " beta_hat[2];
print "  ln(capital)  = " beta_hat[3];
print "  ln(materials)= " beta_hat[4];
print "";
print "Variance components:";
print "  sig2_v (noise)       = " sig2_v_hat;
print "  sig2_u (inefficiency)= " sig2_u_hat;
print "  ratio sig2_u/total   = " sig2_u_hat / (sig2_v_hat + sig2_u_hat);
print "";
print "Iterations: " out.iterations;

This prints out coefficients and variance components:

Coefficients:
  constant     = 1.51
  ln(labor)    = 0.39
  ln(capital)  = 0.31
  ln(materials)= 0.24

Variance components:
  sig2_v (noise)       = 0.022
  sig2_u (inefficiency)= 0.087
  ratio sig2_u/total   = 0.80

Iterations: 38

The estimates recover the true parameters reasonably well. The variance ratio ($\approx 0.80$) tells us that most residual variation is systematic inefficiency, not measurement error — an important finding for policy.

We can also compute and plot firm-level efficiency scores:

// Compute efficiency estimates (Jondrow et al. 1982)
eps = y - X * beta_hat;
sigma = sqrt(sig2_v_hat + sig2_u_hat);
lambda = sqrt(sig2_u_hat / sig2_v_hat);

mu_star = -eps * sig2_u_hat / (sig2_v_hat + sig2_u_hat);
sig_star = sqrt(sig2_v_hat * sig2_u_hat / (sig2_v_hat + sig2_u_hat));

// E[u|eps] - conditional mean of inefficiency
u_hat = mu_star + sig_star * (pdfn(mu_star/sig_star) ./ cdfn(mu_star/sig_star));

// Technical efficiency: TE = exp(-u)
TE = exp(-u_hat);

// Plot efficiency distribution
struct plotControl plt;
plt = plotGetDefaults("hist");
plotSetTitle(&plt, "Distribution of Technical Efficiency");
plotSetXLabel(&plt, "Technical Efficiency (1 = frontier)");
plotSetYLabel(&plt, "Frequency");
plotHist(plt, TE, 20);

print "Mean efficiency: " meanc(TE);
print "Min efficiency:  " minc(TE);
print "Max efficiency:  " maxc(TE);

Mean efficiency: 0.80
Min efficiency:  0.41
Max efficiency:  0.95

The histogram shows substantial variation in efficiency — some firms operate near the frontier (TE $\approx$ 0.95), while others produce 40-50% below their potential. This is the kind of insight that drives productivity research.

Both variance estimates stayed positive throughout optimization. No log-transforms needed, and the estimates apply directly to the parameters we care about.

When to Use minimize

The minimize procedure is designed for one thing: optimization with bound constraints. If that's all you need, it's the right tool.

Situation	Recommendation
Parameters with simple bounds	`minimize`
Nonlinear constraints ($g(x) \leq 0$)	`sqpSolveMT`
Equality constraints	`sqpSolveMT`
Algorithm switching, complex problems	OPTMT

For the GARCH and stochastic frontier examples above — and most MLE problems where parameters have natural bounds — minimize handles it directly.

Conclusion

Bounded parameters show up constantly in econometric models: variances, volatilities, probabilities, shares. GAUSS 26.0.1 gives you a clean way to handle them with minimize. As we saw today minimize:

Set bounds in the control structure
Optimizer respects bounds throughout (not just at the solution)
No log-transforms or penalty functions
Included in base GAUSS

If you've been working around parameter bounds with transforms or checking for invalid values inside your likelihood function, this is the cleaner path.

MLE with Bounded Parameters: A Cleaner Approach

Introduction

Why Bounds Matter

Example 1: GARCH(1,1) on Commodity Returns

Step One: Data and Likelihood

Step Two: Setting Up Optimization

Step Three: Running the Model

Results and Visualization

Example 2: Stochastic Frontier Model

Step One: Data and Likelihood

Step Two: Setting Up Optimization

Step Three: Running the Model

Results and Visualization

When to Use minimize

Conclusion

Further Reading