Application Modules

Q. What does the suffix MT mean that we see in many procedures and in the newer Applications?

A: MT stands for "multithreaded", but that doesn't describe the essential advantage which is the elimination of the use of global variables. All of the "MT" procedures and Applications use a "Control" structure containing the information needed for program settings.

Global variables can be the cause of serious programming errors in computer programs. If you do not explicitly declare a variable in a procedure to be "local", GAUSS will assume that it is global. You may have used that same variable name somewhere else in your program as global. Your procedure compiles and runs fine but your procedure is using the global variable in its calculations and you may be expecting something else to be happening. Everyone re-uses variable names especially for similar purposes, for example, x for a matrix of data, b for a vector of parameters. If they are global, you may get unexpected results. Sometimes programmers explicitly use globals in procedures, but this is dangerous programming because you might not know for sure how that global is defined.

The use of global variables for program settings can be a problem when attempting to run two versions of a program, or a nested version. Sometimes it is useful for the procedure computing the log-likelihood function in Constrained Maximum Likelihood to call another copy of Constrained Maximum Likelihood. If the program settings were global variables, the settings for one of the copies would step on the others. But with Control structures there is no problem at all because they each will have their own Control structure.

The optimization procedures and applications also use two other basic structures included with the Run-Time Library (which means you can use them for your own programs as well): the PV parameter structures, and the DS data structures. These structures produce greatly increased flexibility in your use of the optimization procedures. More on their use can be found at http://www.aptech.com/papers/structures.pdf.
[Back]

Q. How different is CMLMT from CML?

A. The primary difference is that CMLMT uses structures to handle data, parameter, and specifications of the optimization. Instead of global variables there is now a "control" structure. The data and the parameter starting point are also handled using structures in CMLMT. This use of structures significantly changes the command files, and makes it somewhat difficult to port CML command files to CMLMT.

CMLMT uses the DS data structure and the PV parameter structure which are both defined in the Run-Time Library. They considerably increase the flexibility of the optimization. For a discussion of these structures in the context of optimization using Sqpsolvemt, see the paper "Structures and Sqpsolvemt" at http://www.aptech.com/papers.

The PV parameter structure allows you to store and retrieve parameters without having to know where they are in the parameter vector. And it adds flexibility to likelihood procedures that are frequently modified. This was often a tedious task in CML that had a parameter vector which required you to know what was where in the vector.

CMLMT also provides some efficiencies in the calculation of derivatives. The function and the derivatives are a now all computed in the same procedure. This avoids having to repeat calculations that are in common. Also, a subset of derivatives can be computed, the easy ones for example. The ones you don't calculate are returned as missing values and they are computed numerically by CMLMT.

[Back]  

Q. I have just installed an updated version of an application that has always worked in the past, but now I am getting syntax errors in the updated library files. What do I do?

A. You are having problems because the GAUSS application you are trying to use with your older version of GAUSS was built using the current shipping version of GAUSS.

The GAUSS applications library files or *.lcg files now have line numbers included that your version of GAUSS does not understand.

The resolution is fairly straightforward: using the library tool, rebuild the GAUSS applications library files.

For example, if you have just installed Maximum Likelihood 5.0 you would need to rebuild the "maxlik.lcg" and the "count.lcg" files.

[Back]  

Q. The covariance matrix returned from Maxlik (or CML or CMLMT) is a missing value, or I'm getting the error message, "Hessian calculation failed", or the standard errors are not being printed with the estimates.

A. The covariance matrix is a missing value because the Hessian matrix from which it is calculated failed to invert. This means it contains linear dependencies. These dependencies can be due to poor scaling of the data, catastrophic failure of precision, or unidentified parameters in your model.

You will need to look at the Hessian which will be stored in _max_FinalHess in Maxlik, _cml_FinalHess in CML, or the CovPar member of the cmlmtResults output structure. Compute its eigenvalues. A negative definite eigenvalue indicates a catastrophic failure of precision. Your data may need scaling. You may also need to look at the statements in your log-likelihood procedure for calculations causing loss of precision. Anytime calculations in your procedure mix very large numbers with very small numbers, there is a loss of precision. For example, a statement that takes the log of some calculations that includes exponentials.

If there are one or more zero eigenvalues, the Hessian contains linear dependencies. This indicates that there is not enough information in the data to identify certain parameters associated with the linear dependencies. In this case either your model is placing demands that your data cannot meet, or your data fails to contain important information for that parameters. If it is the former, you need to modify your model. If it is the latter you need more data.

To determine which parameters are involved with the linear dependencies, use the technique described in the final section of the paper at http://www.aptech.com/papers/qnewton.pdf. Once you've found these parameters you can develop a plan to bring the Hessian to positive definiteness. For example, if a pair of parameters are implicated in the linear dependency; i.e., one parameter is linearly dependent on another, you can remove one of them from the model or you can apply a constraint, e.g., constrain them to be equal.

[Back]  

Q. How accurate are CML and Maxlik estimates, and will AD improve their accuracy?

A. For its calculations GAUSS uses only double precision floating point numbers, and in modern computers that translates to about 16 decimal places of accuracy. The floating point number looks like this: x.xxxxxxxxxxxxxxx + yyy where yyy is the exponent. The exponent determines the location of the decimal point which is therefore irrelevant with respect to accuracy. The accuracy of this number is determined by the number of x's that are correct. When I say that a numerical derivative loses 4 places of accuracy I mean that the result will be x.xxxxxxxxxxxdddd + yyy where the d's represent inaccurate or incorrect numbers. A numerical Hessian compounds the inaccuracy: x.xxxxxxxdddddddd + yyy.

While AD restores the full 16 places of accuracy to the calculation of the gradient, it doesn't necessarily translate to 16 places of accuracy in the estimates or standard errors. The accuracy of the estimates is largely determined by several factors, the condition number of the Hessian, the accuracy of the derivatives and the Hessian, and the convergence tolerance. The log to the base 10 of the condition number is a rough measure of the number of places of accuracy lost in computing the inverse. It is also a measure of the degree of "flatness" of the log-likelihood around the maximum. If the function is sufficiently flat, a large region of the function cannot be distinguished from the actual maximum and thus the estimates and the standard errors will be inaccurate. If log10(cond(H)) is greater than 16, the estimates and standard errors will be completely inaccurate. With a numerical gradient and Hessian computed using the objective function, resulting in losses of 8 places of accuracy, a log10(cond(H)) of 8 is all that is needed to render the solution completely inaccurate.

Roughly speaking the estimates will have m places of accuracy with a tolerance set to 1e-m provided that the gradient is calculated with at least m places of accuracy and log10(cond(H)) is less than 4. With a tolerance of 1e-5, the estimates will have an accuracy of x.xxxxddddddddddd + yyy. The standard errors will have about half the accuracy of the estimates, i.e., x.xxddddddddddddd + yyy.

AD restores full accuracy to the calculation of the gradient. This results in convergence in fewer iterations and more accuracy for better determination of convergence. The AD gradient procedure can be used in calculating a numerical Hessian. This improves the accuracy of its calculation, lowering the loss from 8 places to 4 places. This will generally improve the condition number of the Hessian adding accuracy to the estimates and the standard errors.
[Back]

Q. How do Nonlinear Equations (NLE) and Maximum Likelihood (MAXLIK) differ?

A. Nonlinear Equations can be used to solve maximum likelihood problems by solving the normal equations rather than minimizing the log-likelihood. However, Nonlinear Equations does not have procedures enabling data set handling or statistical inference. You would have to provide these yourself. Maximum Likelihood has data set handling and statistical inference procedures built into it. Neither Nonlinear Equations nor Maximum Likelihood can handle constrained problems.

[Back]

Q. What optimization programs are available in GAUSS?

A. Four optimization programs come with GAUSS in the Run-Time Library, Qnewton, Qnewtonmt, Sqpsolve, and Sqpsolvemt. Qnewton and Qnewtonmt solve the unconstrained optimization problem, and Sqpsolve and Sqpsolvemt solve the more general Nonlinear Programming (NLP) problem; i.e., one with general constraints on parameters including linear and nonlinear equality constraints, linear and nonlinear inequality constraints and bounds on parameters.

Qnewtonmt and Sqpsolvemt are more advanced versions of Qnewton and Sqpsolve which use the new PV and DS structures. The PV structure handles parameters, and the DS structure handles data. They enormously simplify passing parameters and data to the objective functions. For more details see the article http://www.aptech.com/AS_resLibMF.html

For optimization with more features, Aptech Systems also sells the application modules, Maximum Likelihood (Maxlik), Constrained Maximum Likelihood (CML), Constrained Optimization (CO), and Optimization (Optmum). These solve the optimization problem with many more bells and whistles.

CO and CML solve the NLP optimization problem while Optmum and Maxlik solve the unconstrained optimization problem.

Maxlik and CML solve the maximum likelihood estimation problem, and include a variety of methods for statistical inference, including Wald, by inversion of Wald, profile likelihood, bootstrap, and Bayesian likelihood bootstrap.

All of the applications modules, CO, CML, Optmum, and Maxlik, include a variety of descent methods and line search methods, as well as as methods for switching them automatically or modifying them directly from the keyboard while the iterations are underway.

Also, be advised that the NLP optimization programs, i.e., CO, CML, and Sqpsolvemt, are very effective for unconstrained problems as well as constrained ones, perhaps more effective. The primary reason is that the NLP programs are able to impose a "trust radius" on the directions computed at each iteration. There are theorems in the optimization literature that establish superior performance for trust region methods and my own experience bears this out.

Moreover, is important to note that all statistical models contain restricted parameter spaces (even simple regression -- the error variance is restricted to be nonnegative), and the application of unconstrained optimization methods often results in degraded descents toward convergence when estimates confront undefined regions of the parameter space. Explicitly introducing constraints on parameter space using the NLP methods generally produces improved results and can often make a critical difference when the unconstrained problem is failing.

Thus the use of CO, CML, and Sqpsolvemt for unconstrained as well as constrained optimization problems might generally be recommended.

[Back]

Q. My log-likelihood is complex. What do I do?

A. A complex log-likelihood indicates a negative probability. This suggests an error in your log-likelihood. This could occur as the result of a catastrophic failure of precision. For example you might be taking the log of an exponential. Here you are taking a small number, making it very large by taking its exponential and then taking the log of that in the process losing many places of accuracy. This could affect other calculations ending up with taking the log of a negative number which is complex. To solve this problem you will need to find out what calculation is causing the negative probability. Insert the following in your log-likelihood procedure:

    if not(ll > 0);
    print "here";
    endif;

Run your problem in the debugger and place a break point on the above print statement. Hit the run button and let it run to that print statement. Then use the matrix editor to look at the various matrices involved in the calculation to see where things may have gone wrong.

Another possibility is that your parameter space may include undefined regions. For example, you might be taking the log of the variance in your model and the estimated variance has become negative. If this is the problem you will need to constrain the parameters in the model so that all regions of the parameter space produce a well-defined log-likelihood. For Maxlik this would require transformations of the parameters, for example, you could estimate the standard deviation rather than the variance, i.e., square the parameter estimate in the procedure and add a small number because a zero variance might be a problem as well.

The Constrained Maximum Likelihood (CML) module was developed for just this type of problem. It allows for general constraints, linear or nonlinear, on the parameters, and it performs generally better on a constrained problem than does Maxlik with parameter transformations.
[Back]

Q. I just got MAXLIK 5.0. When I run it, I get a message that scalInfNanMiss is undefined. What's wrong?

A. MAXLIK 5.0 requires GAUSS 3.6.18 or greater.

[Back]

Q. What is the difference between Maximum Likelihood and Optimization or Constrained Maximum Likelihood and Constrained Optimization?

A. Optimization and Maximum Likelihood are nearly identical except that Maximum Likelihood has been designed to handle data sets. This allows it to provide four types of statistical inference - Wald, Profile Likelihood, Bootstrap, and Bayesian. It also allows for an additional descent method, the BHHH.

Optimization could be used for maximum likelihood problems. However, since it doesn't know about data sets, you would have to write your own procedures for statistical inference. Moreover, if your data sets were large and didn't fit into memory, you would have to include code in your function for reading in the data. Maximum Likelihood does this for you.

If your optimization problems did not involve maximum likelihood, you would be better off with Optimization because the additional features in Maximum Likelihood for handling data would just get in your way. But if you are solving maximum likelihood problems, Maximum Likelihood would save you the considerable effort required for handling data and providing for statistical inference.

[Back]

Q. What is the difference between Constrained Maximum Likelihood and Maximum Likelihood or Constrained Optimization and Optimization?

A. The constrained versions of these programs can, of course, solve unconstrained problems as well as constrained ones. However, the methods the constrained programs must use are more complicated and time consuming - constrained problems are more difficult to solve and require additional, complicated methods not required for unconstrained problems. For that reason, you can expect that it will take the constrained programs more time and resources to solve an unconstrained problem than it would take the unconstrained programs to solve them.

The essential calculation in Maximum Likelihood and Optimization is the solution of a linear equation, Hd = g, where H is the Hessian, or an approximation to the Hessian, g is the gradient vector, and d is the direction vector that we are solving for. Constrained Maximum Likelihood and Constrained Optimization, on the other hand, must solve d'Hd + d'G subject to equality and inequality constraints, which is a much more difficult problem. Thus, you should expect Constrained Maximum Likelihood to take a little more time to solve an unconstrained problem than it would Maximum Likelihood.

[Back]

Q. What is the difference between Nonlinear Equations and Maximum Likelihood?

A. Nonlinear Equations could be used to solve maximum likelihood problems, as you indicate above, by solving the normal equations rather than minimizing the log-likelihood. As with Optimization, however, you would be required to provide code for handling data and statistical inference.

Nonlinear Equations is a small, fast program. It would be quite useful if you wanted to solve many small likelihood problems for which you didn't need statistical inference. This advantage might be lost, however, if you computed the derivatives numerically.

[Back]

Q. What is the difference between Optimization and Nonlinear Equations or Constrained Optimization and Nonlinear Equations?

A. Optimization and Constrained Optimization can solve systems of equations by minimizing the sum of squared deviations. Nonlinear Equations has been specially designed to handle this type of problem and for that reason can be expected to solve it more efficiently. However, Nonlinear Equations cannot handle constrained problems, and thus for that type of problem Constrained Optimization would be the better choice.

[Back]

Q. The application modules OPTMUM and MAXLIK seem to share a lot of features. Can someone explain the differences in functionality between the two modules?

A. OPTMUM minimizes a general, user-provided function, whereas MAXLIK specializes that function to the log-likelihood function. The important difference is that MAXLIK is designed to handle a data set. Among GAUSS users the most common type of optimization is maximum likelihood estimation, and MAXLIK significantly reduces the amount of programming in this case. In addition, MAXLIK adds the BHHH descent method which only makes sense when there is a data set. I would like to add, though, that the BHHH descent method is inferior to BFGS and DFP, and is included only because it appears to be quite popular among econometricians. The BHHH method, a type of "scoring" method, relies on the equivalence of the cross-product of the first derivatives with the Hessian; i.e., the second derivatives. As H. White has shown, this equivalence breaks down under misspecification. All researchers know that many more misspecified models are estimated than correctly specified ones, and thus one should expect the BHHH method to have difficulty most of the time. The "variable metric" methods, such as BFGS and DFP, do not rely on the correct specification of a model for their performance. However, allow me to point out that they do require a positive definite Hessian with reasonably scaled parameters, i.e., the diagonal elements of the Hessian being all about the same order of magnitude. If these requirements are not met, neither OPTMUM nor MAXLIK will be likely to converge to a solution.

[Back]


© Copyright 2004-2008.   Aptech Systems, Inc.
Black Diamond, WA.  All Rights Reserved Worldwide.