I am using this combination with a relatively large data set (21,000 rows) fully held in memory. The OS is Win 7 64-bit version, 16-MB RAM.
I have run smaller versions of the estimation problem with Maxlik using the cross-product of the gradient (gg')^-1 as the covariance matrix estimator. At convergence in smaller problems gg' is invertible. With the bigger problem it is not, though the model specification has not changed.
Is it possible that the size of the data set is causing this behaviour?
Thanks for any help/insight.
In order to compute the cross-product, the log-probabilities must be computed by observation. In your procedure you have
where loglikeli is a scalar log-probability being expanded to a vector of the length of the number of observations. This is not the same as a vector of log-probabilities. This is not a vector of log-probabilities by observation and thus cannot be used to compute the cross-product matrix. It appears you are estimating an MNL model in which the log-likelihood is being computed from cells of a cross-tabulation and thus log-probabilities by observation aren't generally possible.
I looked at the Hessian for this model (stored in _max_FinalHess) and it was negative definite. This is the result of a catastrophic failure of precision in the calculation of the Hessian. You could try working with the calculations in the log-likelihood. Precision is lost wherever large numbers are mixed with small numbers and your procedure contains the use of exp() along with ln() which are classical precision issues. Another solution would be to write a procedure for computing the analytical gradient. The Hessian would then be computed as a gradient of a gradient, increasing precision.
Large MNL models are frequently difficult to estimate because there is the tendency for an increase of zero or nearly zero cell frequencies. This results in a loss of information for standard errors. Another possible solution would be to fix some parameters to zero, or to each other thus reducing the parameter space. If you reduce the parameter space sufficiently you might have enough information to estimate the remaining parameters.