Introduction
In this blog, we examine one of the fundamentals of panel data analysis, the one-way error component model. Today we will:
- Explain the theoretical one-way error component model.
- Consider fixed effects vs. random effects.
- Estimate models using an empirical example.
The theoretical one-way error component model
The one-way error-component model is a panel data model which allows for individual-specific or temporal-specific error components
$$ \begin{equation}y_{it} = \alpha + X_{it} \beta + u_{it} \label{OWEM}\end{equation}$$ $$ u_{it} = \mu_{i} + \nu_{it} $$
where the subscript i indicates cross-sections of households, individuals, firms, countries, etc. and the subscript t indicates time periods.
In this model, the individual-specific error component, $\mu_{i}$, captures any unobserved effects that are different across individuals but fixed across time.
The one-way error component model | |
---|---|
$\alpha$ | Variable of interest which measures an intercept that is constant across all individuals and time periods. |
$\beta$ | Variable of interest which measures the effect of x on y. It is constant across all individuals and time periods. |
$\mu_i$ | Individual-specific variation in y which stays constant across time for each individual. In the fixed effects model this is an individual-specific effect to be estimated. In the random effects model this follows a random distribution with parameters that must be estimated. |
$\nu_{it}$ | Usual stochastic regression disturbance which varies across time and individuals. |
Fixed effects vs. random effects
The two most common approaches to modeling individual-specific error components are the fixed effects model and the random effects model.
The key difference between these two approaches is how we believe the individual error component behaves.
The fixed effects model
In the fixed effects model the individual error component:
- Can be thought of as an individual-specific intercept term.
- Captures any omitted variables that are not included in the regression.
- Is correlated with other variables included in the model.
Given these assumptions, the fixed effects model can be thought of as a pooled OLS model with individual specific intercepts:
$$\begin{equation}y_{it} = \delta_{i} + X_{it} \beta + \nu_{it}\label{FEM}\end{equation}$$
The intercept term, $\delta_i$, varies across individuals but is constant across time for each individual. This term is composed of the constant intercept term, $\alpha$, and the individual-specific error terms, $\mu_i$.
The distinguishing feature of the fixed effects model is that $\delta_i$ has a true, but unobservable, effect which we must estimate.
The random effects model
In the random effects model the individual-specific error component, $\mu_i$:
- Is distributed randomly and is independent of $\nu_{it}$.
- Occurs in cases where individuals are drawn randomly from a large population, such as household studies (Baltagi, 2008).
- Is assumed to be uncorrelated with all other variables in the model.
- Random effects impact our model through the covariance structure of the error term.
For example, consider the total error disturbance in the model, $ u_{it} = \mu_{i} + \nu_{it} $. The covariance of the error at time t and time s depends on the variance of both $\mu_{i}$ and $\nu_{it}:$
$$\begin{equation}cov(u_{it}, u_{is}) = \left\{ \begin{array}{ll} \sigma_{\mu}^2 & \text{for } t \neq s \\ \sigma_{\mu}^2 + \sigma_{\nu}^2 & \text{for } t = s \\ \end{array} \right. \label{REM}\end{equation} $$
The distinguishing feature of the random effects model is that $\mu_i$ does not have a true value but rather follows a random distribution with parameters that we must estimate.
Estimation
The fixed effects model
In the fixed effects model, the individual effects introduce an endogeneity that will result in biased estimates if not properly accounted for.
Fortunately, we can make consistent estimates using one of three estimation techniques:
- Within-group estimation
- First differences estimation
- Least squares dummy variable (LSDV) estimation
The first two of these techniques focuses on eliminating the individual effects before estimation. The LSDV method directly incorporates these effects using dummy variables.
Within-group estimator | LSDV estimator | First differences estimator | |
---|---|---|---|
Data transformation | Demean the data. | Use dummy variables. | Difference the data. |
Regression equation | $$\widetilde{Y_i} = \widetilde{X_i} \beta_{fe} + \widetilde{\nu_i} $$ | $$Y_{it} = X_{it} \beta_{fe} +\\ \alpha D_{i} + \nu_{it}$$ | $$\Delta{Y}_{it} = \Delta{X}_{it} \beta_{fe} + \Delta{\nu}_{it} $$ |
Let's consider an example panel dataset with three individuals and three time periods shown in the table below.
Individual | Time Period | Y_{it} | Within Group Ave. Y_{i} | X_{it} | Within Group Ave. X_{i} |
---|---|---|---|---|---|
1 | 1 | 3.901 | 2.744 | 0.978 | 1.174 |
1 | 2 | 2.345 | 2.744 | 1.798 | 1.174 |
1 | 3 | 1.987 | 2.744 | 0.745 | 1.174 |
2 | 1 | 1.250 | 1.715 | 1.652 | 1.425 |
2 | 2 | 0.654 | 1.715 | 0.438 | 1.425 |
2 | 3 | 3.240 | 1.715 | 2.185 | 1.425 |
3 | 1 | 0.901 | 2.077 | 2.119 | 1.653 |
3 | 2 | 1.341 | 2.077 | 1.516 | 1.653 |
3 | 3 | 3.989 | 2.077 | 1.324 | 1.653 |
Example within-group estimation
We will estimate the fixed effects model using the within-group method. This can be done in three steps:
- Find the within-subject means.
- Demean the dependent and independent variables using the within-subject means.
- Run a linear regression using the demeaned variables.
Finding the within-subject means
To find the within-subject mean of Y for individual one we compute:
$$ \bar{Y_{1}} = \frac{(3.901 + 2.345 + 1.987)}{3} = 2.7443 .$$
We can find the within-subject means using the withinMeans
procedure from the pdlib
library. The withinMeans
procedure requires two inputs:
- grps
- (T*N) x 1 matrix, group identifier.
- data
- (T*N) x k, panel data.
Using our sample data stored in the GAUSS data file simple_data.dat:
// Load data
data = loadd("simple_data.dat");
// Assign groups variable
grps = data[., 1];
// Assign y~x matrix
reg_data = data[.,3:4];
// Find group means
grp_means = withinMeans(grp, reg_data);
print "Group means for Y and X:";
grp_means;
Our output reads:
Group means for Y and X: 2.7443 1.1737 1.7147 1.4250
Demeaning the data
The next step is to demean the data. This removes any time-invariant effects. After finding the within-subject means, the data is demeaned:
$$ \widetilde{Y_1} = Y_{1t} - \overline{Y}_1 =\\ 3.901 - 2.744 = 1.157,\\ 2.345 - 2.744 = -0.399,\\ 1.987 - 2.744 = -0.757 .$$
In GAUSS we can demean data using the demeanData
procedure from the pdlib
library. The demeanData
procedure requires two inputs:
- grps
- (T*N) x 1 matrix, group identifier.
- data
- (T*N) x k, panel data.
The demeanData
procedure internally computes the within-subject means and requires just the the reg_data
and grps
variables that we created in the first step:
// Remove time-invariant group means
data_tilde = demeanData(grps, reg_data);
print "Demeaned data:";
data_tilde;
print;
Our demeaned data is printed in the output:
Demeaned data: 1.1567 -0.1957 -0.3993 0.6243 -0.7573 -0.4287 -0.4647 0.2270 -1.0607 -0.9870 1.5253 0.7600 -1.1760 0.4660 -0.7360 -0.1370 1.9120 -0.3290
Performing the regression
Once we have transformed our x and y data we are ready to estimate the parameters of the fixed effects regression model:
$$\widetilde{Y_i} = \widetilde{X_i} \beta_{fe} + \widetilde{\nu_i} $$
where
$$\widehat{\beta}_{fe} = (\widetilde{X_i}'\widetilde{X_i})^{-1}(\widetilde{X_i}'\widetilde{Y_i}) .$$
Using the data we previously demeaned:
// Extract variables
y_tilde = data_tilde[., 1];
x_tilde = data_tilde[., 2];
// Regress independent on dependent variables
coeff = inv(x_tilde'x_tilde)*(x_tilde'y_tilde);
// Print the fixed effects coefficient
print "Fixed effects coefficient:";
coeff;
The result reads:
Fixed effects coefficient: 0.3413
Using the fixedEffects procedure
As an alternative to computing these three steps separately, we can use the fixedEffects
procedure from the GAUSS panel data library, pdlib
. This procedure runs all three steps in a single call. The fixedEffects
procedure takes four inputs:
- y
- (T*N) x 1 matrix, the panel of stacked dependent variables.
- x
- (T*N) x k matrix, the panel of stacked independent variables.
- grps
- (T*N) x 1 matrix, group identifier.
- robust
- Scalar, an indicator variable of whether to use robust standard errors.
// Use fixedEffects procedure
call fixedEffects(reg_data[.,1], reg_data[.,2], grps, 1);
This prints:
------------------- FIXED EFFECTS (WITHIN) RESULTS ------------------- Observations : 9 Number of Groups : 3 Degrees of freedom : 2 R-squared : 0.026 Adj. R-squared : -0.558 Residual SS : 11.021 Std error of est : 1.485 Total SS (corrected) : 11.319 F = 0.054 with 1,2 degrees of freedom P-value = 0.838 Variable Coef. Std. Error t-Stat P-Value ---------------------------------------------------------------------- X1 0.341276 1.011041 0.337549 0.768
The random effects model
The covariance structure of the random effects model means that pooled OLS will result in inefficient estimates. Instead, the random effects model is estimated using pooled feasible generalized least squares (FGLS).
The pooled FGLS method estimates the model
$$\widetilde{Y_i} = \widetilde{W_i} \delta_{re} + \widetilde{\epsilon_i}$$
where the data is transformed using $\Omega = E[\epsilon_i \epsilon_i']$
$$\widetilde{Y_i} = \Omega^{-\frac{1}{2}}Y_{i},$$ $$\widetilde{W_i} = \Omega^{-\frac{1}{2}}W_{i},$$ $$\widetilde{\epsilon_i} = \Omega^{-\frac{1}{2}}\epsilon_{i},$$
and
$$W_i = [1, X_i],$$ $$\delta = [\alpha, \beta']',$$ $$\epsilon_i = \mu_i i_T + \nu_i .$$
The most difficult part of estimating this model is estimating $\Omega$ and there are a number of different proposed methods.
Example random effects estimation
One of the most common approaches for estimating the random effects model:
- Estimates the between-group regression to obtain $\sigma_u^2$.
- Estimates the within-group regression to obtain $\sigma_{\nu}^2$.
- Transforms the data using $\sigma_u^2$ and $\sigma_{\nu}^2$.
- Finds the pooled OLS estimator using the transformed data.
We can perform these steps in one procedure call using the randomEffects
procedure in pdlib
GAUSS library.
Using the randomEffects procedure
The randomEffects
procedure takes four inputs:
- y
- (T*N) x 1 matrix, the panel of stacked dependent variables.
- x
- (T*N) x k matrix, the panel of stacked independent variables.
- grps
- (T*N) x 1 matrix, group identifier.
- robust
- Scalar, an indicator variable of whether to use robust standard errors.
Continuing with our fixed effects example, we will use our sample data stored in the GAUSS data file simple_data.dat.
// Use randomEffects procedure
call randomEffects(reg_data[., 1], reg_data[., 2], grps, 1);
---------------------- GLS RANDOM EFFECTS RESULTS ---------------------- Observations : 9 Number of Groups : 3 Degrees of freedom : 2 R-squared : 0.004 Adj. R-squared : -2.985 Residual SS : 12.907 Std error of est : 1.358 Total SS (corrected) : 12.956 F = 3.314 with 2,2 degrees of freedom P-value = 0.232
Variable Coef. Std. Error t-Stat P-Value ---------------------------------------------------------------------- CONSTANT 1.994513 1.720996 1.158930 0.366 X1 0.129940 1.053423 0.123350 0.913
Conclusion
In today's blog we have covered the fundamentals of the individual error component models:
- The theoretical one-way error component model.
- Fixed effects vs. random effects.
- Estimating fixed effects and random effects.
The code and data for this blog can be found at our Aptech Blog Github code repository.
References
Baltagi, B. (2008). Econometric analysis of panel data. John Wiley & Sons.
Erica has been working to build, distribute, and strengthen the GAUSS universe since 2012. She is an economist skilled in data analysis and software development. She has earned a B.A. and MSc in economics and engineering and has over 15 years combined industry and academic experience in data analysis and research.