Goals
This tutorial builds on the first four econometrics tutorials. It is suggested that you complete those tutorials prior to starting this one.
This tutorial demonstrates how to test for influential data after OLS regression. After completing this tutorial, you should be able to :
- Test model specification using the link test.
- Test for missing variables using the Ramsey regression specification error test (RESET).
Introduction
A common source of model specification error in OLS regressions is the omission of relevant variables. When variables are omitted, variations in the dependent variable may be falsely attributed to the included variables. This can result in inflated errors for regressors and can distort the estimated coefficients. In this tutorial, we will test for omitted variables using the link test and the Ramsey RESET test. Following previous tutorials, we've estimated an OLS model and stored the results using data simulates from the data generating process, $ y_{i} = 1.3 + 5.7 x_{i} + \epsilon_{i} $, where $ \epsilon_{i} $ is the random disturbance term.
The Link Test
The motivation behind the link test is the idea that if a regression is specified appropriately you should not be able to find additional independent variables. To test this, the link test regresses the dependent variable of the original regression against the original regression's prediction and the squared prediction. If the squared prediction regressor in the test regression is significant, there is evidence the model is misspecified.
To run the link test we construct the $\hat{y}$ and $\hat{y}^2$ variables from the results of the original regression and run the regression
$$y = \hat{y}b_1 + \hat{y}^2b_2 + \epsilon$$
//Add column of ones to x for constant
x_full = ones(num_obs, 1) ~ x;
//Predicted y values
y_hat = x_full * b;
//Concatenate and form regressor
link_regressors = y_hat ~ y_hat.^2;
//Test regression
call ols("", y, link_regressors);
The above code will print the following report:
Valid cases: 100 Dependent variable: Y Missing cases: 0 Deletion method: None Total SS: 3481.056 Degrees of freedom: 97 R-squared: 0.969 Rbar-squared: 0.969 Residual SS: 106.782 Std error of est: 1.049 F(2,97): 1532.578 Probability of F: 0.000 Durbin-Watson: 2.023 Standard Prob Standardized Cor with Variable Estimate Error t-value >|t| Estimate Dep Var ---------------------------------------------------------------------------- CONSTANT 0.03422 0.12382 0.276351 0.783 --- --- X1 1.00687 0.02119 47.510909 0.000 0.99125 0.98448 X2 -0.00127 0.00205 -0.619914 0.537 -0.01293 0.50545
The OLS
results show a 53.7% p-value for our coefficient on $\hat{y}^2$. This suggests that we cannot reject the null hypothesis that the coefficient is equal to zero. This finding that the $\hat{y}^2$ is insignificant in our test regression suggests that our model does not suffer from omitted variables.
The Ramsey RESET Test
The Ramsey RESET test is based on the same concept but runs the regression
$$ y_i = x_ib + z_it + u_i $$
where $z_i = (\hat{y}^2, \hat{y}^3, \hat{y}^4$). The predicted $y$ value is normalized between 0 and 1 before the powers are calculated. If the regression is properly specified, the coefficients on all powers of the predicted $y$ should be jointly insignificant.
Normalize $\hat{y}$
To run the link test we need to normalize the predicted $y$ values, then construct the additional variables $\hat{y}^3$ and $\hat{y}^4$. To normalize the predicted $y$ from 0 to 1 we use min max normalization such that
$$y_{norm} = \frac{\hat{y}-\hat{y}_{min}}{\hat{y}_{max} - \hat{y}_{min}}$$
//Normalize y_hat
y_hat_norm = (y_hat - minc(y_hat))/(maxc(y_hat) - minc(y_hat));
RESET regression
Unlike the link test, the Ramsey RESET test regression includes the regressors from the original regression:
$$y = xb_1 + \hat{y}^2b_2 + \hat{y}^3b_3 + \hat{y}^4b_4 + \epsilon$$
This time we will store the results because we need to conduct the hypothesis test that $b_2$, $b_3$, and $b_4$ are jointly insignificant.
//Concatenate and form regressor
ram_regressors = x ~ y_hat_norm.^2 ~ y_hat_norm.^3 ~ y_hat_norm.^4;
//Test regression
{ ram_nam, ram_m, ram_b, ram_stb, ram_vc, ram_std,
ram_sig, ram_cx, ram_rsq, ram_resid, ram_dbw } = ols("",y, ram_regressors);
The code above will print the following report:
Valid cases: 100 Dependent variable: Y Missing cases: 0 Deletion method: None Total SS: 3481.056 Degrees of freedom: 95 R-squared: 0.971 Rbar-squared: 0.970 Residual SS: 100.511 Std error of est: 1.029 F(4,95): 798.798 Probability of F: 0.000 Durbin-Watson: 1.918 Standard Prob Standardized Cor with Variable Estimate Error t-value >|t| Estimate Dep Var ---------------------------------------------------------------------------- CONSTANT 2.68599 3.23368 0.83063 0.408 --- --- X1 6.75663 1.73379 3.89703 0.000 1.162524 0.984481 X2 -1.12135 34.93156 -0.03210 0.974 -0.035223 0.938781 X3 -21.13182 51.19730 -0.41275 0.681 -0.605543 0.856568 X4 18.50884 25.23111 0.73357 0.465 0.490670 0.771187
RESET hypothesis test
To complete our RESET test for omitted variables we need to test the hypothesis that the coefficients on all powers of y_hat_norm
are jointly insignificant. Therefore, the Ramsey RESET test null hypothesis is:
$$ H_0 : b_2 = b_3 = b_4 = 0 $$
using the F-statistics
$$F_0 = \frac{(SSR_r - SSR_{ur})/q}{SSR_{ur}/(n-(k+1))}$$
where $$SSR_r = \text{sum of squares restricted model}$$ $$SSR_{ur} = \text{sum of squares unrestricted model}$$ $$q = \text{number of restrictions}$$ $$n = \text{number of observations}$$ $$k = \text{number of regressors in unrestricted model}$$
In this case, the restricted model is $y = \alpha + \beta*x$, which is conveniently what we estimated in our original model.
//Find SSR from original model
SSR_r = resid'*resid;
//Find SSR from unrestricted model
SSR_ur = ram_resid'*ram_resid;
//Number of restrictions
q = 3;
//Number of regressors in unrestricted model
k = cols(ram_regressors);
//Construct F stat
F_ram = ((SSR_r - SSR_ur)/q)/(SSR_ur/(num_obs-(k+1)));
print "F-stat for restriction b_2 = b_3 = b_4 :" F_ram;
//Probability of F:
p_value = cdffc(F_ram, q, (num_obs-k));
print "Probability of F: " p_value;
The p-value for our F-stat is 10.4%. Therefore, at 5% significance level, we fail to reject the Ramsey RESET test null hypothesis of correct specification. This indicates that the functional form is correct and our model does not suffer from omitted variables.
Conclusion
Congratulations! You have:
- Calculated the link test model misspecification.
- Calculated the RESET test for model misspecification.
For convenience, the full program text is below.
//Clear the workspace
new;
//Set seed to replicate results
rndseed 23423;
//Number of observations
num_obs = 100;
//Generate independent variables
x = rndn(num_obs,1);
//Generate error terms
error_term = rndn(num_obs, 1);
//Generate y from x and error_term
y = 1.3 + 5.7*x + error_term;
//Turn on residuals computation
_olsres = 1;
//Estimate model and store results in variables
{ nam, m, b, stb, vc, std, sig, cx, rsq, resid, dbw } = ols("", y, x);
/**************************************************************************/
//Add column of ones to x for constant
x_full = ones(num_obs, 1) ~ x;
//Predicted y values
y_hat = x_full * b;
//Concatenate and form regressor
link_regressors = y_hat ~ y_hat.^2;
//Test regression
call ols("", y, link_regressors);
/**************************************************************************/
//Normalize y_hat
y_hat_norm = (y_hat - minc(y_hat))/(maxc(y_hat) - minc(y_hat));
//Concatenate and form regressor
ram_regressors = x ~ y_hat_norm.^2 ~ y_hat_norm.^3 ~ y_hat_norm.^4;
//Test regression
{ ram_nam, ram_m, ram_b, ram_stb, ram_vc, ram_std,
ram_sig, ram_cx, ram_rsq, ram_resid, ram_dbw } = ols("",y, ram_regressors);
//Find SSR from original model
SSR_r = resid'*resid;
//Find SSR from unrestricted model
SSR_ur = ram_resid'*ram_resid;
//Number of restrictions
q = 3;
//Number of regressors in unrestricted model
k = cols(ram_regressors);
//Construct F stat
F_ram = ((SSR_r - SSR_ur)/q)/(SSR_ur/(num_obs-(k+1)));
print "F-stat for restriction b_2 = b_3 = b_4 :" F_ram;
//Probability of F:
p_value = cdffc(F_ram, q, (num_obs-k));
print "Probability of F: " p_value;