OLS Estimation with Endogenous Regressors

Goals

This tutorial demonstrates the GMM estimation of a simple OLS model using the gmmFit and gmmFitIV procedures. After completing this tutorial you should be able to estimate an instrumental variables model using:

Introduction

In this example, we will expand on the OLS model to estimate an instrumental variables model. We will again demonstrate how to estimate the model using both gmmFit and gmmFitIV. The linear model will examine the relationship between the dependent variable rent and housing values hsngval and the percentage of the population living in urban areas pcturban.

$$rent = \alpha + \beta_1*hsngval + \beta_2*pcturban$$

The data for this model is stored in the GAUSS dataset "hsng.dat".

The new addition to this model is the endogeneity of the variable hsngval. As a solution for the endogeneity, we will instrument for hsngval using pcturban, family income (faminc) and three regional dummies (reg2, reg3, reg4).

Estimation with gmmFitIV

The gmmFitIV procedure uses the GAUSS formula string syntax to set up estimation. In the case of the instrumental variables model you must include three pieces of information to set up the model:

  1. The dataset name.
  2. A formula string representing the model.
  3. An instrumental variable string.
//Dataset
dataset = getGAUSShome $+ "examples/hsng.dat";

//Model formula
formula = "rent ~ hsngval + pcturban";

//String of instrumental variables
inst_var = "pcturban + faminc + reg2 + reg3 + reg4";

call gmmFitIV(dataset, formula, inst_var);

The output from our gmmFitIV estimation reads

Dependent Variable:                      rent
Number of Observations:                    50
Number of Moments:                          0
Number of Parameters:                       3
Degrees of freedom:                        47

                         Standard                Prob
Variable     Estimate      Error     t-value     >|t|
-----------------------------------------------------------

CONSTANT   112.122713   10.545763    10.632     0.000
hsngval      0.001464    0.000404     3.627     0.001
pcturban     0.761548    0.264387     2.880     0.006

Instruments: pcturban, faminc, reg2, reg3, reg4, Constant

Hansen Test Statistic of the Moment Restrictions
Chi-Sq(3) =        6.9753314
P-value of J-stat:     0.072688216

Estimation with gmmFit

Load data

When using the gmmFit procedure, we must start our estimation by loading our data into data matrices and separating our data into three different data matrices y, x, and z.

//Load data file
data = loadd(getGAUSShome $+ "examples/hsng.dat","rent + hsngval + 
                                                  pcturban + faminc + 
                                                  reg2 + reg3 + reg4");

//Extract x and y matrix
y = data[., 1];
x = data[., 2:3];

//Extract instrumental variables matrix
z = data[., 3:7];

//Add constant to z
z = ones(rows(z), 1)~z;

Write moment equation

The next step for our gmmFit estimation is to define our moment procedure. The instrumental variable model uses moments based on $E[z_tu_t(\theta_0)] = 0$ with $u_t(\theta_0) = y_t - \beta_tx_t$. Note that the resulting moment equation now has four total inputs because of the addition of z to the inputs.

proc meqn(b, yt, xt, zt);

    local ut, dt;

    /**  OLS resids         **/
    ut = yt - b[1] - b[2]*xt[., 1] - b[3]*xt[., 2];  

    /**  Moment conditions  **/
    dt = ut .* zt;                     

    retp(dt);

endp;

Set model parameters

For this example, rather than setting specific starting values for the parameters, we will specify the number of parameters to be estimated using gctl.numParams. This specification will allow GAUSS to find starting parameters.

//Declare gctl to be a gmmControl struct
//and fill with default settings
struct gmmControl gctl;
gctl = gmmControlCreate();

//Set starting values
gctl.numParams = 3;

We will also set up the initial weight matrix for the gmmFit estimation so it will replicate the default model of the gmmFitIV procedure.

In this model, the exogenous variables are contained in the data matrix z and the default initial weight matrix used by gmmFitIV will be equal to $\frac{1}{N}(Z'Z)^{-1}$. We can specify for gmmFit to use the same matrix using the gmmControl member gctl.wInitMat

//Set initial weight matrix
gctl.wInitMat = invpd((1/rows(z))*(z'z));

Finally, we add variable names. This time we wish to add both the model variable names using gctl.varNames and the instrument names using gctl.instNames

//Variable names
gctl.varNames = { "hsngval", "pcturban", "rent" };

//Instrument names
gctl.instNames = { "pcturban", "faminc", "reg2", "reg3", "reg4" };

Call gmmFit

We are now ready to call gmmFit. Notice that this time z must be included as an input into gmmFit

call gmmFit(&meqn, y, x, z, gctl);

The output from our gmmFit estimation reads

Dependent Variable:                      rent
Number of Observations:                    50
Number of Moments:                          6
Number of Parameters:                       3
Degrees of freedom:                        47

                         Standard                Prob
Variable     Estimate      Error     t-value     >|t|
-----------------------------------------------------------

CONSTANT   112.122790   10.545745    10.632     0.000
hsngval      0.001464    0.000404     3.627     0.001
pcturban     0.761552    0.264387     2.880     0.006

Conclusion

Congratulations! You have:

  • Estimated an instrumental variables model using gmmFitIV.
  • Estimated an instrumental variables model using gmmFit.

For convenience, the full program text is reproduced below.

//Dataset
dataset = getGAUSShome $+ "examples/hsng.dat";

//Model formula
formula = "rent ~ hsngval + pcturban";

//String of instrumental variables
inst_var = "pcturban + faminc + reg2 + reg3 + reg4";

call gmmFitIV(dataset, formula, inst_var);

//Load data file
data = loadd(getGAUSShome $+ "examples/hsng.dat","rent + hsngval + 
                                                  pcturban + faminc + 
                                                  reg2 + reg3 + reg4");

//Extract x and y matrix
y = data[., 1];
x = data[., 2:3];

//Extract instrumental variables matrix
z = data[., 3:7];
//Add constant to z
z = ones(rows(z),1)~z;

//Declare gctl to be a gmmControl struct
//and fill with default settings
struct gmmControl gctl;
gctl = gmmControlCreate();

//Set starting values
gctl.numParams = 3;

//Set initial weight matrix
gctl.wInitMat = invpd((1/rows(z))*(z'z));

//Variable names
gctl.varNames = { "hsngval", "pcturban", "rent" };

//Instrument names
gctl.instNames = { "pcturban", "faminc", "reg2", "reg3", "reg4" };

call gmmFit(&meqn, y, x, z, gctl);

proc meqn(b, yt, xt, zt);

    local ut,dt;

    /**  OLS resids         **/
    ut = yt - b[1] - b[2]*xt[.,1] - b[3]*xt[.,2];  

    /**  Moment conditions  **/
    dt = ut .* zt;                     

    retp(dt);

endp;

Have a Specific Question?

Get a real answer from a real person

Need Support?

Get help from our friendly experts.