Introduction
In time series modeling we often encounter trending or nonstationary time series data. Understanding the characteristics of such data is crucial for developing proper time series models. For this reason, unit root testing is an essential step when dealing with time series data.
In this blog post, we cover everything you need to conduct time series data unit root tests using GAUSS. This includes:
- An introduction to the concept of unit roots.
- A discussion of why unit roots matter.
- How to prepare your time series data for unit root testing.
- How to run and interpret the fundamental tests in GAUSS including the Augmented Dickey-Fuller (ADF) and Phillips-Perron unit root tests and the KPSS stationarity test.
- Where to look for more advanced unit root testing procedures.
What Are Unit Roots?
Unit roots are often used interchangeably with the idea of nonstationarity. This isn’t completely off base, because the two are related. However, it is important to remember that while all unit root processes are nonstationary, not all nonstationary time series are unit root processes.
What is a Stationary Time Series?
A time series is stationary when all statistical characteristics of that series are unchanged by shifts in time. Time series models generally are valid only under the assumption of weak stationarity.
A weakly stationary time series has:
- The same finite unconditional mean and finite unconditional variance at all periods.
- An autocovariance that is independent of time.
Nonstationarity can be caused by many factors including structural breaks, time trends, or unit roots.
What is a Unit Root?
A unit root process:
- Contains a stochastic, or random walk, component;
- Is sometimes referred to as integrated of order one, I(1);
- Has a root of the characteristic equation which lies outside the unit circle. (The mathematics of this is beyond the scope of this blog).
There are some important implications of unit roots:
- Shocks to unit root processes have permanent effects.
- Detrending unit root time series does NOT lead to stationarity but first-differencing does.
Why is a Unit Root Process Nonstationary?
Let’s consider the simplest example, the AR(1) unit root process
$$Y_t = \phi_0 + Y_{t-1} + \epsilon_t$$
Since the mean of the error term, $\epsilon_t$, is zero and the scalar value $\phi_0$ is added at each time period, the expected value of this process is
$$E[Y_t] = \phi_0t$$
which changes with time. Additionally, the variance given by
$$var(Y_t) = var(\epsilon_1 + \epsilon_2 + \epsilon_3 ...+ \epsilon_t)$$
is also dependent on time.
Why Are Unit Roots Important in Time Series Modeling?
Unit root processes have important implications for time series modeling including:
- Permanence of shocks;
- Spurious regressions;
- Invalid inferences.
Permanence of Shocks
If time series data contains a unit root, shocks will have a permanent impact on the path of the data.
The top panel in the above graph shows the impact of a random shock on an AR(1) process with a unit root. After the shock hits the process transitions to a new path and there is no mean-reversion.
Conversely, the bottom panel shows the impact of the same shock to a stationary AR(1) series. In this case, the impact of the shock is transitory and the series reverts to the original mean when the blue line of the shock path overlaps the orange line of the original series.
Spurious Regressions
Many time series models which estimate the relationship between two variables assume that both are stationary series. When neither series is stationary, regression models can find relationships between the two series that do not exist.
Let’s look at an example using GAUSS.
First, we simulate two unit root series:
// Number of observations
nobs = 150;
// Generate two vectors of random disturbances
e1 = rndn(nobs, 1);
e2 = rndn(nobs, 1);
// Find cumulative sum of disturbances
y1 = cumsumc(e1);
x1 = cumsumc(e2);
Next, we use the ols
procedure to regress y1 on x1:
call ols("", y1, x1);
The ols
procedure prints the following results to the input/output window:
Valid cases: 150 Dependent variable: Y Missing cases: 0 Deletion method: None Total SS: 5161.244 Degrees of freedom: 148 R-squared: 0.450 Rbar-squared: 0.446 Residual SS: 2838.019 Std error of est: 4.379 F(1,148): 121.154 Probability of F: 0.000 Standard Prob Standardized Cor with Variable Estimate Error t-value >|t| Estimate Dep Var ------------------------------------------------------------------------------- CONSTANT -3.645808 0.419546 -8.689894 0.000 --- --- X1 0.615552 0.055924 11.006998 0.000 0.670916 0.670916
Despite the fact that these two series are unrelated, the model suggests a statistically significant relationship between y1 and x1. The estimated coefficient on x1 is 0.616 with a t-statistic of 11.00 and a p-value of 0.000.
A regression of one I(1) series on other I(1) series can:
- Lead to OLS coefficients that do not converge to the true value of zero and do not have the standard normal distribution.
- Cause OLS t-stats to diverge to infinity and falsely suggest statistically significant relationships.
- Results in a $R^2$ that converges to one, incorrectly suggesting strong model fit.
Invalid Inferences
Unit root time series have three characteristics that can impact inferences in standard time series models:
- The mean is not constant over time;
- The variance of the series is non-constant;
- The autocorrelation between adjacent observations decays very slowly.
Combined, these imply that the Law of Large Numbers does not hold for a nonstationary series. This, in turn, means that inferences based on standard test-statistics and distributions are no longer valid.
How to Prepare Data for Unit Root Testing
Before running any unit root tests, we must first determine if our data has any deterministic components such as a constant or time trend.
How do we determine if our data has a time trend or constant?
Time series plots are useful for identifying constants and time trends which is why the first step to any time series modeling should be data visualization.
The graph above plots three different AR(1) time series. The time series plot in the first panel has no constant and no trend with the data generating process
$$y_t = 0.7y_{t-1}$$
We can tell visually that the series has no constant or trend, because it fluctuates around the zero line.
The time series plot in the second panel shows an AR(1) plot with a constant term. The plotted data follows the data generating process
$$y_t = 1.5 + 0.7y_{t-1}$$
The series has the same shape as the first series but is shifted upward and fluctuates around 1.5 instead of 0.
The time series plot in the final panel shows an AR(1) plot with a constant term and time trend. The plotted data follows the data generating process
$$y_t = 1.5 + 0.2_t + 0.7y_{t-1}$$
This series fluctuates around an increasing, time-dependent, line.
Deciding which unit root test is right for your data? Download our Unit Root Selection Guide!
Unit Root Testing Versus Stationarity Tests
When testing for I(1) series, there are two broad categories of tests, those that test for unit roots and those that test for stationarity.
Unit root tests consider the null hypothesis that a series contains a unit root against the alternative that the series is trend stationary.
Time series stationarity tests consider the null hypothesis that a series is trend stationary against the alternative that it contains a unit root.
It is important to distinguish which test we are running to avoid making incorrect conclusions about our test results.
The Augmented Dickey-Fuller Test
The first unit root test we will consider is the Augmented Dickey-Fuller (ADF) test. The ADF test is based on the test regression
$$\Delta y_t = \beta D_t + (\rho-1) y_{t-1} + \sum_{j=1}^p \psi_j \Delta y_{t-j} + \epsilon_t$$
where $D_T$ is a vector of deterministic components which can include a constant and/or a trend.
The ADF test considers the null hypothesis that the series is I(1), or has a unit root, against the alternative hypothesis that the series is I(0).
The ADF test eliminates serial correlation in the residuals by including the lagged dependent variables, $\Delta y_{t-j} $, in the specification.
The Phillips-Perron Test
The Phillips-Perron test also considers the null hypothesis that the series contains a unit root against the alternative that there is no unit root. However, it addresses the issues of serial correlation by adjusting the OLS estimate of the AR(1) coefficient.
Three specifications are considered, an AR(1) model without a drift, an AR(1) with a drift, and an AR(1) model with a drift and linear trend:
$$ Y_t = \rho Y_{t-1} + \epsilon_t\\ Y_t = \alpha + \rho Y_{t-1} + \epsilon_t\\ Y_t = \alpha + \delta t + \rho Y_{t-1} + \epsilon_t$$
The KPSS test
Unlike the previous tests, the KPSS uses a Lagrange multiplier type test of the null hypothesis of stationarity against the alternative hypothesis that the data contains a unit root.
The residuals from the regression $y_t = \Delta D_t$ are used in combination with a heteroskedasticity and autocorrelation consistent estimate of the long-run variance, to construct the KPSS test statistic.
How to Run Unit Root Tests in GAUSS
The Time Series MT Application includes tools for conducting standard unit root testing. Today we will consider the three fundamental unit root tests discussed above:
- Augmented Dickey-Fuller test;
- Phillips-Perron test;
- KPSS test.
Simulated Data
For this example, we will simulate data using the simarmamt
procedure. The three time series we will simulate, encompass three different cases of deterministic components:
new;
cls;
library tsmt;
/*
** Step One: Generate Data
*/
// Coefficient on AR(1) term
phi = 0.80;
// AR order
p = 1;
// MA order
q = 0;
// Constant
const1 = 0;
const2 = 2.5;
const3 = 2.5;
// Trend
trend1 = 0;
trend2 = 0;
trend3 = 0.20;
// Number of obsevations
n = 150;
// Number of series
k = 1;
// Standard deviation
std = 1;
// Set seed for reproducing data
seed = 10191;
// Case One: No deterministic components
y1 = simarmamt(phi, p, q, const1, trend1, n, k, std, seed);
// Case Two: With Constant
y2 = simarmamt(phi, p, q, const2, trend2, n, k, std, seed);
// Case Three: With Constant and Trend
y3 = simarmamt(phi, p, q, const3, trend3, n, k, std, seed);
The Augmented Dickey-Fuller Test
We can run the Augmented Dickey-Fuller test in GAUSS using the vmadfmt
procedure included in the Time Series MT library.
The vmadfmt
procedure requires three inputs:
- yt
- Vector, the time series data.
- p
- Scalar, the order of the deterministic component. Valid options include:
-1 = No deterministic component,
0 = Constant,
1 = Constant and trend. - l
- Scalar, the number of lagged dependent variables to include in the ADF regression.
In this case, we know that our data is AR(1), so we set l = 1. Also, we will call vmadfmt
three times, once for each of our datasets:
/*
** ADF Testing
*/
/* Order of deterministic trend to include
** -1 No deterministic trends
** 0 Constant
** 1 Constant and Trend
*/
// No deterministic trends
{ rho1, tstat1, adf_t_crit1 } = vmadfmt(y1, -1, 1);
// Constant
{ rho2, tstat2, adf_t_crit2 } = vmadfmt(y2, 0, 1);
// Constant and trend
{ rho3, tstat3, adf_t_crit3 } = vmadfmt(y3, 1, 1);
Interpreting the ADF Results
The vmadfmt
procedure returns three values:
- rho
- The estimated autoregressive coefficient.
- tstat
- The t-statistic for the estimated autoregressive coefficient, rho.
- tcrit
- (6 x 1) vector of critical values for the ADF t-statistic: 1, 5, 10, 90, 95, 99%
We've summarized the tstat and tcrit results in the table below:
Case | Test Statistic | 1% Critical Value | 5% Critical Value | 10% Critical Value | Conclusion |
---|---|---|---|---|---|
Y_{1} | -3.6897 | -2.6025 | -1.9423 | -1.5950 | Reject the null |
Y_{2} | -4.0473 | -3.4391 | -2.9152 | -2.5841 | Reject the null |
Y_{3} | -4.4899 | -4.0052 | -3.4611 | -3.1552 | Reject the null |
Augmented Dickey-Fuller Intepretation
There are several things to note about these results:
- The critical values are specific to the deterministic trends and follow a non-standard distribution known as the Dickey-Fuller distribution.
- As the t-stat (tstat1, tstat2, tstat3) decreases, the likelihood of rejecting the null hypothesis increases.
- For y1, y2 and y3 we can reject the null hypothesis of the unit root at the 1% level. This is because the t-stats of -3.690, -4.047, -4.490 are less than the 1% respective critical values of -2.602, -3.439, and -4.005.
These results should not be surprising to us since we used simulated data with an autoregressive coefficient whose absolute value was less than one. It should be noted that we also knew the correct lag specification and deterministic trends to use with our test because we knew the true data generating process.
In reality, the data generating processes will be unknown and you may need to conduct additional testing to confirm the proper lags and deterministic trends.
The Phillips-Perron Test
We can run the Phillip-Perron test in GAUSS using the vmppmt
procedure included in the Time Series MT library.
The vmppmt
procedure requires three inputs:
- yt
- Vector, the time series data.
- p
- Scalar, the order of the deterministic component. Valid options include:
-1 = No deterministic component,
0 = Constant,
1 = Constant and trend. - nwtrunc
- Scalar, the number of autocorrelations to use in calculating the Newey-West correction. GAUSS will compute the data-driven, optimal truncation length when this is set to 0.
For the Phillips-Perron case, we will let GAUSS pick the optimal Newey-West truncation length by setting nwtrunc = 0. Again, we will call vmppmt
three times, once for each of our datasets:
/*
** Phillips-Perron
*/
/* The second input reflects the deterministic
** components to include
** -1 No deterministic trends
** 0 Constant
** 1 Constant and Trend
*/
// No deterministic components
{ ppb1, ppt1, pptcrit1 } = vmppmt(y1, -1, 0);
// Constant
{ ppb2, ppt2, pptcrit2 } = vmppmt(y2, 0, 0);
// Constant and trend
{ ppb3, ppt3, pptcrit3 } = vmppmt(y3, 1, 0);
Interpretting the Phillip-Perron Results
The vmppmt
procedure returns three values:
- ppb
- The estimated autoregressive coefficient.
- ppt
- The t-statistic for the estimated autoregressive coefficient, ppb.
- pptcrit
- (6 x 1) vector of critical values for the pp t-statistic: 1, 5, 10, 90, 95, 99%
Again, we summarize the two most relevant outputs, ppt and pptcrit:
Case | Test Statistic | 1% Critical Value | 5% Critical Value | 10% Critical Value | Conclusion |
---|---|---|---|---|---|
Y_{1} | -3.7332 | -2.5856 | -1.9448 | -1.6246 | Reject the null |
Y_{2} | -4.1367 | -3.4642 | -2.9124 | -2.5884 | Reject the null |
Y_{3} | -4.5911 | -4.0009 | -3.4542 | -3.1625 | Reject the null |
Like our ADF results, there are several notable points to draw from these results:
- The critical values are specific to the deterministic trends and follow the same Dickey-Fuller distribution as the ADF test.
- More observations are used for running the Phillips-Perron test regression than in the ADF case. The ADF case loses observations due to lagging. As a result, the critical values for the Phillips-Perron test differ from those for the ADF test of the same data.
- As the t-stat (ppt1, ppt2, ppt3) decreases, the likelihood of rejecting the null hypothesis increases.
- For y1, y2 and y3 we can reject the null hypothesis of the unit root at the 1% level. This is because the t-stats of -3.733, -4.137, -4.591 are less than the 1% respective critical values of -2.586, -3.464, and -4.001.
The KPSS Test
The final test we will run is Kwiatkowski–Phillips–Schmidt–Shin (KPSS) tests. The KPSS test can be conducted in GAUSS using the kpss
procedure included in the Time Series MT library.
The kpss
procedure has one required input and five optional inputs:
- yt
- Vector, the time series data.
- max_lags
- Optional input, scalar, the maximum lags included in the KPSS test. Providing a non-negative, non-zero integer for max_lags directly specifies the maximum lag autocovariance used. If the max_lags input is zero, the maximum number of lags is determined using the Schwert criterion. Default = 0.
- trend
- Optional input, scalar, 0 if no trend is present, 1 if a trend is present. Default = 0.
- qsk
- Optional input, scalar, if nonzero, the quadratic spectral kernel is used. Default = 0.
- auto
- Optional input, scalar, if nonzero, automatic max_lags computed. Default = 1.
- print_out
- Optional input, scalar, if nonzero, intermediate quantities will be printed to the screen. Default = 1.
Today we will use only the first two optional inputs and will use the default values for the rest. Again, we call KPSS three times, once for each of our datasets:
/*
** KPSS
*/
/*
** Note that we use the default maxlags
** and trend settings for the two cases without a trend.
*/
// No deterministic trends
{ lm1, crit1 } = kpss(y1);
// Constant and no trend
{ lm2, crit2 } = kpss(y2);
// Constant and trend
{ lm3, crit3 } = kpss(y3, 0, 1);
Interpretting the KPSS Results
The kpss
procedure returns two values:
- kpss_lm
- (maxlags x 1) vector of KPSS Lagrange multiplier statistics for stationarity.
- crit
- (4 x 1) vector of critical values for the KPSS statistic: 1, 2.5, 5, 10%
Lag | Y_{1} | Y_{2} | Y_{3} |
---|---|---|---|
1 | 1.4556 | 1.4556 | 0.3124 |
2 | 1.0363 | 1.0363 | 0.2301 |
3 | 0.8603 | 0.8603 | 0.1964 |
4 | 0.7637 | 0.7637 | 0.1785 |
5 | 0.7041 | 0.7041 | 0.1678 |
6 | 0.6658 | 0.6658 | 0.1613 |
7 | 0.6444 | 0.6444 | 0.1585 |
8 | 0.6349 | 0.6349 | 0.1582 |
9 | 0.6318 | 0.6318 | 0.1586 |
Critical Values (1%, 5%, 10%) |
0.739, 0.463, 0.347 | 0.739, 0.463, 0.347 | 0.216, 0.146, 0.119 |
Conclusion | Reject the null (1% level) | Reject the null (1% level) | Reject the null (5% level) |
This test yields some interesting results worth noting:
- The test statistic is computed for each of the lags up to the maximum lags. The Schwert criterion selected 4 lags as the optimal number of lags for all cases (which is highlighted in the table).
- The presence of a non-zero constant does not affect the test statistic or the critical values. However, the presence of a trend affects both.
- Because this is a one-sided LM test of stationarity, we reject the null hypothesis of stationarity at a specific p-level if the test statistic exceeds the critical value.
- Note that the KPSS test for y1 and y2 suggest that we should reject the null hypothesis of stationarity at least at the 5% level for all lags.
- The KPSS test for trend stationarity of y3 shows similar results, though they are more sensitive to lag selection.
These results highlight one of the known issues with the KPSS test. We know our data is stationary because we know the true data generating process. However, the KPSS test concludes that we should reject the null hypothesis of stationarity.
The KPSS test is known for incorrectly rejecting the null hypothesis of stationarity more frequently than other tests. This is known as having a high rate of Type 1 Errors.
Where to Find More Advanced Unit Root Tests
In this blog, we have covered very fundamental, simple unit root tests. However, much work has been done to develop unit root tests for more complex cases. For example:
- Unit root testing on panel data requires a separate set of tools. The GAUSS TSMT library provides the following panel data unit root tests:
- Im, Pesaran, and Shin;
- Levin-Lin-Chu;
- Breitung and Das.
- Data with structural breaks requires specialized unit root tests which accommodate nonlinearities. A full suite of these tests, for both time series and panel data, are provided in the free and open-source GAUSS tspdlib library.
Conclusion
This week we covered everything you need to know to be able to test your data for unit roots using GAUSS. This includes:
- An introduction to the concept of unit roots.
- A discussion of why unit roots matter.
- How to prepare your time series data for unit root testing.
- How to run and interpret the fundamental tests in GAUSS including the Augmented Dickey-Fuller (ADF) and Phillips-Perron unit root tests and the KPSS stationarity test.
- Where to look for more advanced unit root testing procedures.
After this week you should have a better understanding of how to determine if a time series is stationary using GAUSS.
Erica has been working to build, distribute, and strengthen the GAUSS universe since 2012. She is an economist skilled in data analysis and software development. She has earned a B.A. and MSc in economics and engineering and has over 15 years combined industry and academic experience in data analysis and research.