Introduction
State space models are a powerful tool for analyzing time series data, especially when you want to estimate unobserved components like trends or cycles. But traditionally, setting up these models—even for something as common as ARIMA—can be tedious.
The GAUSS arimaSS
function, available in the Time Series MT 4.0 library, lets you estimate state space ARIMA models without manually building the full state space structure. It’s a cleaner, faster, and more reliable way to work with ARIMA models.
In this post, we’ll revisit our inflation modeling example using updated data from the Federal Reserve Economic Data (FRED) database. Along the way, we’ll demonstrate how arimaSS
works, how it simplifies the modeling process, and how easy it is to generate forecasts from your results.
Why use arimaSS
in TSMT?
In our earlier state-space inflation example, we manually set up the state space model. This process required a solid understanding of state space modeling, specifically:
- Setting up the system matrices.
- Initializing state vectors.
- Managing model dynamics.
- Specifying parameter starting values.
In comparison, the arimaSS
function handles all of this setup automatically. It internally constructs the appropriate model structure and runs the Kalman filter using standard ARIMA specifications.
Overall, the arimaSS
function provides:
- Simplified syntax: No need to manually define matrices or system dynamics. This not only saves time but also reduces the chance of errors or model misspecification.
- More robust estimates: Behind-the-scenes improvements, such as enhanced covariance computations and stationarity enforcement, lead to more accurate and stable parameter estimates.
- Compatibility with forecasting tools: The
arimaSS
output structure integrates directly with TSMT tools for computing and plotting forecasts.
The arimaSS
Procedure
The arimaSS
procedure has two required inputs:
- A time series dataset.
- The AR order.
It also allows four optional inputs for model customization:
- The order of differencing.
- The moving average order.
- An indicator controlling whether a constant is included in the model.
- An indicator controlling whether a trend is included in the the model.
General Usage
aOut = arimaSS(y, p [, d, q, trend, const]);
- Y
- Tx1 or Tx2 time series data. May include date variable, which will be removed from the data matrix and is not included in the model as a regressor.
- p
- Scalar, the number of autoregressive lags included in the model.
- d
- Optional, scalar, the order of differencing. Default = 0.
- q
- Optional, scalar, the moving average order. Default = 0.
- trend
- Optional, scalar, an indicator variable to include a trend in the model. Set to 1 to include trend, 0 otherwise. Default = 0.
- const
- Optional, an indicator variable to include a constant in the model. Set to 1 to include constant, 0 otherwise. Default = 1.
All returns are stored in an arimaOut
structure, including:
- Estimated parameters.
- Model diagnostics and summary statistics.
- Model description.
The complete contents of the arimaOut
structure include:
Member | Description |
---|---|
amo.aic |
Akaike Information Criterion value. |
amo.b |
Estimated model coefficients (Kx1 vector). |
amo.e |
Residuals from the fitted model (Nx1 vector). |
amo.ll |
Log-likelihood value of the model. |
amo.sbc |
Schwarz Bayesian Criterion value. |
amo.lrs |
Likelihood Ratio Statistic vector (Lx1). |
amo.vcb |
Covariance matrix of estimated coefficients (KxK). |
amo.mse |
Mean squared error of the residuals. |
amo.sse |
Sum of squared errors. |
amo.ssy |
Total sum of squares of the dependent variable. |
amo.rstl |
Instance of kalmanResult structure containing Kalman filter results. |
amo.tsmtDesc |
Instance of tsmtModelDesc structure with model description details. |
amo.sumStats |
Instance of tsmtSummaryStats structure containing summary statistics. |
Example: Modeling Inflation
Today, we’ll use a simple, albeit naive, model of inflation. This model is based on a CPI inflation index created from the FRED CPIAUCNS monthly dataset.
To begin, we’ll load and prepare our data directly from the FRED database.
Loading data from FRED
Using the fred_load
and fred_set
procedures, we will:
- Pull the continuously compounded annual rate of change from FRED.
- Include data starting from January 1971 (1971m1).
// Set observation start date
fred_params = fred_set("observation_start", "1971-01-01");
// Specify units to be
// continuous compounded annual
// rate of change
fred_params = fred_set("units", "cca");
// Specify series to pull
series = "CPIAUCNS";
// Pull data from FRED
cpi_data = fred_load(series, fred_params);
// Preview data
head(cpi_data);
This prints the first five observations:
date CPIAUCNS 1971-01-01 0.0000000 1971-02-01 3.0112900 1971-03-01 3.0037600 1971-04-01 2.9962600 1971-05-01 5.9701600
To further preview our data, let's create a quick plot of the inflation series using the plotXY
procedure and a formula string:
plotXY(cpi_data, "CPIAUCNS~date");
For fun, let’s add a reference line to visualize the Fed’s long-run average inflation target of 2%:
// Add inflation target line at 2%
plotAddHLine(2);
As one final visualization, let's look at the 5 year (60 month) moving average line:
// Compute moving average
ma_5yr = movingAve(cpi_data[., "CPIAUCNS"], 60);
// Add to time series plot
plotXY(cpi_data[., "date"], ma_5yr);
// Add inflation targetting line at 2%
plotAddHLine(2);
The moving average plot highlights long-term trends, filtering out short-term fluctuations and noise:
- The Disinflation Era: (app. 1980-1993): This period is marked by the steep decline in inflation from the double-digit highs of the early 1980s to around 3% by the early 1990s, an outcome of aggressive monetary policy by the Federal Reserve.
- The ‘Great Moderation’ (mid-1990s- mid-2000s): Inflation remained relatively stable and low, hovering near the Fed's 2% target, marked here with a horizontal line for reference.
- Post-GFC stagnation (2008-2020): After the 2008 Global Financial Crisis, inflation trended even lower, with the 5-year average dipping below 2% for an extended period, reflecting sluggish demand and persistent slack.
- Recent surge: The sharp rise beginning around 2021 reflects the post-pandemic spike in inflation, pushing the 5-year average above 3% for the first time in over a decade.
We’ll make one final transformation before estimation by converting the "CPIAUCNS" values from percentages to decimals.
cpi_data[., "CPIAUCNS"] = cpi_data[., "CPIAUCNS"]/100;
fred_load
procedure requires a valid API key. To download data directly from FRED into GAUSS, you must obtain an API key from FRED and set it in GAUSS.For more details on importing data from FRED, see our earlier blog post, Importing FRED Data to GAUSS.ARIMA Estimation
Now that we’ve loaded our data, we’re ready to estimate our model using arimaSS
. We’ll start with a simple AR(2) model. Based on the earlier visualization, it’s reasonable to include a constant but exclude a trend, so we’ll use the default settings for those options.
call arimaSS(cpi_data, 2);
There are a few helpful things to note about this:
- We did not need to remove the date vector from cpi_data before passing it to
arimaSS
. Most TSMT functions allow you to include a date vector with your time series. In fact, this is recommended, GAUSS will automatically detect and use the date vector to generate more informative results reports. - In this example, we are not storing the output. Instead, we are printing it directly to the screen using the
call
keyword. - Because this is strictly an AR model and we’re using the default deterministic components, we only need two inputs: the data and the AR order.
A detailed results report is printed to screen:
================================================================================ Model: ARIMA(2,0,0) Dependent variable: CPIAUCNS Time Span: 1971-01-01: Valid cases: 652 2025-04-01
SSE: 0.839 Degrees of freedom: 648 Log Likelihood: -1244.565 RMSE: 0.036 AIC: -2497.130 SEE: 0.210 SBC: -2463.210 Durbin-Watson: 1.999 R-squared: 0.358 Rbar-squared: 0.839 ================================================================================ Coefficient Estimate Std. Err. T-Ratio Prob |>| t -------------------------------------------------------------------------------- Constant 0.03832 0.00349 10.97118 0.00000 CPIAUCNS L(1) 0.59599 0.03715 16.04180 0.00000 CPIAUCNS L(2) 0.00287 0.03291 0.08726 0.93046 Sigma2 CPIAUCNS 0.00129 0.00007 18.05493 0.00000 ================================================================================
There are some interesting observations from our results:
- The estimated constant is statistically significant and equal to 0.038 (3.8%). This is higher than the Fed’s long-run inflation target of 2%, but not by much. It’s also important to note that our dataset begins well before the era of formal Fed inflation targeting.
- All coefficients are statistically significant except for the
CPIAUCNS L(2)
coefficient. - The table header includes the timespan of our data. This was automatically detected because we included a date vector with our input. If no date vector is included, the timespan will be reported as
unknown
.
Extra credit: Looping For Model Selection
The arimaSS
procedure doesn’t currently provide built-in optimal lag selection. However, we can write a simple for
loop and use an array of structures to identify the best lag length.
Our goal is to select the model with the lowest AIC, allowing for a maximum of 6 lags.
Two tools will help us with this task:
- An array of structures to store the results from each model.
- A vector to store the AIC values from each model.
// Set maximum lags
maxlags = 6;
// Declare a single array
struct arimamtOut amo;
// Reshape to create structure array
amo = reshape(amo, maxlags, 1);
// AIC storage vector
aic_vector = zeros(maxlags, 1);
Next, we’ll loop through our models. In each iteration, we will:
- Store the results in a separate
arimamtOut
structure. - Extract the AIC and store it in our AIC vector.
- Adjust the sample size so that each lag selection iteration uses the same number of observations.
// Loop through lag possibilities
for i(1, maxlags, 1);
// Trim data to enforce sample
// size consistency
y_i = trimr(cpi_data, maxlags-i, 0);
// Estimate the current
// AR(i) model
amo[i] = arimaSS(y_i, i);
// Store AIC for easy comparison
aic_vector[i] = amo[i].aic;
endfor;
Finally, we will use the minindc
procedure to find the index of the minimum AIC:
// Optimal lag is equal to location
// of minimum AIC
opt_lag = minindc(aic_vector);
// Print optimal lags
print "Optimal lags:"; opt_lag;
// Select the final output structure
struct arimamtOut amo_final;
amo_final = amo[opt_lag];
The optimal lags based on the minimum AIC is 8, yielding the following results:
================================================================================ Model: ARIMA(8,0,0) Dependent variable: CPIAUCNS Time Span: 1971-01-01: Valid cases: 652 2025-04-01
SSE: 0.803 Degrees of freedom: 642 Log Likelihood: -1258.991 RMSE: 0.035 AIC: -2537.982 SEE: 0.080 SBC: -2453.182 Durbin-Watson: 1.998 R-squared: 0.385 Rbar-squared: 0.939 ================================================================================ Coefficient Estimate Std. Err. T-Ratio Prob |>| t -------------------------------------------------------------------------------- Constant 0.03824 0.00512 7.46526 0.00000 CPIAUCNS L(1) 0.58055 0.03917 14.82047 0.00000 CPIAUCNS L(2) -0.03968 0.04730 -0.83883 0.40156 CPIAUCNS L(3) -0.01156 0.05062 -0.22833 0.81939 CPIAUCNS L(4) 0.09288 0.04151 2.23749 0.02525 CPIAUCNS L(5) 0.02322 0.04773 0.48639 0.62669 CPIAUCNS L(6) -0.06863 0.04505 -1.52333 0.12767 CPIAUCNS L(7) 0.16048 0.04038 3.97391 0.00007 CPIAUCNS L(8) -0.00313 0.02778 -0.11281 0.91018 Sigma2 CPIAUCNS 0.00123 0.00007 18.05512 0.00000 ================================================================================
Conclusion
The arimaSS
function offers a streamlined approach to estimating ARIMA models in state space form, eliminating the need for manual specification of system matrices and initial values. This makes it easier to explore models, experiment with lag structures, and generate forecasts, especially for users who may not be deeply familiar with state space modeling.
Further Reading
- Introduction to the Fundamentals of Time Series Data and Analysis
- Importing FRED Data to GAUSS
- Understanding State-Space Models (An Inflation Example)
- Getting Started with Time Series in GAUSS
Eric has been working to build, distribute, and strengthen the GAUSS universe since 2012. He is an economist skilled in data analysis and software development. He has earned a B.A. and MSc in economics and engineering and has over 18 years of combined industry and academic experience in data analysis and research.