Easier ARIMA Modeling with State Space: Revisiting Inflation Modeling Using TSMT 4.0

Introduction

State space models are a powerful tool for analyzing time series data, especially when you want to estimate unobserved components like trends or cycles. But traditionally, setting up these models—even for something as common as ARIMA—can be tedious.

The GAUSS arimaSS function, available in the Time Series MT 4.0 library, lets you estimate state space ARIMA models without manually building the full state space structure. It’s a cleaner, faster, and more reliable way to work with ARIMA models.

In this post, we’ll revisit our inflation modeling example using updated data from the Federal Reserve Economic Data (FRED) database. Along the way, we’ll demonstrate how arimaSS works, how it simplifies the modeling process, and how easy it is to generate forecasts from your results.

Why use arimaSS in TSMT?

In our earlier state-space inflation example, we manually set up the state space model. This process required a solid understanding of state space modeling, specifically:

  • Setting up the system matrices.
  • Initializing state vectors.
  • Managing model dynamics.
  • Specifying parameter starting values.

In comparison, the arimaSS function handles all of this setup automatically. It internally constructs the appropriate model structure and runs the Kalman filter using standard ARIMA specifications.

Overall, the arimaSS function provides:

  • Simplified syntax: No need to manually define matrices or system dynamics. This not only saves time but also reduces the chance of errors or model misspecification.
  • More robust estimates: Behind-the-scenes improvements, such as enhanced covariance computations and stationarity enforcement, lead to more accurate and stable parameter estimates.
  • Compatibility with forecasting tools: The arimaSS output structure integrates directly with TSMT tools for computing and plotting forecasts.

The arimaSS Procedure

The arimaSS procedure has two required inputs:

  1. A time series dataset.
  2. The AR order.

It also allows four optional inputs for model customization:

  1. The order of differencing.
  2. The moving average order.
  3. An indicator controlling whether a constant is included in the model.
  4. An indicator controlling whether a trend is included in the the model.

General Usage

aOut = arimaSS(y, p [, d, q, trend, const]);

Y
Tx1 or Tx2 time series data. May include date variable, which will be removed from the data matrix and is not included in the model as a regressor.
p
Scalar, the number of autoregressive lags included in the model.
d
Optional, scalar, the order of differencing. Default = 0.
q
Optional, scalar, the moving average order. Default = 0.
trend
Optional, scalar, an indicator variable to include a trend in the model. Set to 1 to include trend, 0 otherwise. Default = 0.
const
Optional, an indicator variable to include a constant in the model. Set to 1 to include constant, 0 otherwise. Default = 1.

All returns are stored in an arimaOut structure, including:

  • Estimated parameters.
  • Model diagnostics and summary statistics.
  • Model description.

The complete contents of the arimaOut structure include:

Member Description
amo.aic Akaike Information Criterion value.
amo.b Estimated model coefficients (Kx1 vector).
amo.e Residuals from the fitted model (Nx1 vector).
amo.ll Log-likelihood value of the model.
amo.sbc Schwarz Bayesian Criterion value.
amo.lrs Likelihood Ratio Statistic vector (Lx1).
amo.vcb Covariance matrix of estimated coefficients (KxK).
amo.mse Mean squared error of the residuals.
amo.sse Sum of squared errors.
amo.ssy Total sum of squares of the dependent variable.
amo.rstl Instance of kalmanResult structure containing Kalman filter results.
amo.tsmtDesc Instance of tsmtModelDesc structure with model description details.
amo.sumStats Instance of tsmtSummaryStats structure containing summary statistics.

Example: Modeling Inflation

Today, we’ll use a simple, albeit naive, model of inflation. This model is based on a CPI inflation index created from the FRED CPIAUCNS monthly dataset.

To begin, we’ll load and prepare our data directly from the FRED database.

Loading data from FRED

Using the fred_load and fred_set procedures, we will:

  • Pull the continuously compounded annual rate of change from FRED.
  • Include data starting from January 1971 (1971m1).
// Set observation start date
fred_params = fred_set("observation_start", "1971-01-01");

// Specify units to be 
// continuous compounded annual 
// rate of change
fred_params = fred_set("units", "cca");

// Specify series to pull
series = "CPIAUCNS";

// Pull data from FRED
cpi_data = fred_load(series, fred_params);

// Preview data
head(cpi_data);

This prints the first five observations:

            date         CPIAUCNS
      1971-01-01        0.0000000
      1971-02-01        3.0112900
      1971-03-01        3.0037600
      1971-04-01        2.9962600
      1971-05-01        5.9701600 

To further preview our data, let's create a quick plot of the inflation series using the plotXY procedure and a formula string:

plotXY(cpi_data, "CPIAUCNS~date");

For fun, let’s add a reference line to visualize the Fed’s long-run average inflation target of 2%:

// Add inflation target line at 2%
plotAddHLine(2);

US CPI based inflation with inflation targeting line.

As one final visualization, let's look at the 5 year (60 month) moving average line:

// Compute moving average
ma_5yr = movingAve(cpi_data[., "CPIAUCNS"], 60);

// Add to time series plot
plotXY(cpi_data[., "date"], ma_5yr);

// Add inflation targetting line at 2%
plotAddHLine(2);

5 year moving average US CPI based inflation with inflation targeting line.

The moving average plot highlights long-term trends, filtering out short-term fluctuations and noise:

  1. The Disinflation Era: (app. 1980-1993): This period is marked by the steep decline in inflation from the double-digit highs of the early 1980s to around 3% by the early 1990s, an outcome of aggressive monetary policy by the Federal Reserve.
  2. The ‘Great Moderation’ (mid-1990s- mid-2000s): Inflation remained relatively stable and low, hovering near the Fed's 2% target, marked here with a horizontal line for reference.
  3. Post-GFC stagnation (2008-2020): After the 2008 Global Financial Crisis, inflation trended even lower, with the 5-year average dipping below 2% for an extended period, reflecting sluggish demand and persistent slack.
  4. Recent surge: The sharp rise beginning around 2021 reflects the post-pandemic spike in inflation, pushing the 5-year average above 3% for the first time in over a decade.

We’ll make one final transformation before estimation by converting the "CPIAUCNS" values from percentages to decimals.

cpi_data[., "CPIAUCNS"] = cpi_data[., "CPIAUCNS"]/100;

ARIMA Estimation

Now that we’ve loaded our data, we’re ready to estimate our model using arimaSS. We’ll start with a simple AR(2) model. Based on the earlier visualization, it’s reasonable to include a constant but exclude a trend, so we’ll use the default settings for those options.

call arimaSS(cpi_data, 2);

There are a few helpful things to note about this:

  1. We did not need to remove the date vector from cpi_data before passing it to arimaSS. Most TSMT functions allow you to include a date vector with your time series. In fact, this is recommended, GAUSS will automatically detect and use the date vector to generate more informative results reports.
  2. In this example, we are not storing the output. Instead, we are printing it directly to the screen using the call keyword.
  3. Because this is strictly an AR model and we’re using the default deterministic components, we only need two inputs: the data and the AR order.

A detailed results report is printed to screen:

================================================================================
Model:                 ARIMA(2,0,0)          Dependent variable:        CPIAUCNS
Time Span:              1971-01-01:          Valid cases:                    652
                        2025-04-01
SSE: 0.839 Degrees of freedom: 648 Log Likelihood: -1244.565 RMSE: 0.036 AIC: -2497.130 SEE: 0.210 SBC: -2463.210 Durbin-Watson: 1.999 R-squared: 0.358 Rbar-squared: 0.839 ================================================================================ Coefficient Estimate Std. Err. T-Ratio Prob |>| t -------------------------------------------------------------------------------- Constant 0.03832 0.00349 10.97118 0.00000 CPIAUCNS L(1) 0.59599 0.03715 16.04180 0.00000 CPIAUCNS L(2) 0.00287 0.03291 0.08726 0.93046 Sigma2 CPIAUCNS 0.00129 0.00007 18.05493 0.00000 ================================================================================

There are some interesting observations from our results:

  1. The estimated constant is statistically significant and equal to 0.038 (3.8%). This is higher than the Fed’s long-run inflation target of 2%, but not by much. It’s also important to note that our dataset begins well before the era of formal Fed inflation targeting.
  2. All coefficients are statistically significant except for the CPIAUCNS L(2) coefficient.
  3. The table header includes the timespan of our data. This was automatically detected because we included a date vector with our input. If no date vector is included, the timespan will be reported as unknown.

Extra credit: Looping For Model Selection

The arimaSS procedure doesn’t currently provide built-in optimal lag selection. However, we can write a simple for loop and use an array of structures to identify the best lag length.

Our goal is to select the model with the lowest AIC, allowing for a maximum of 6 lags.

Two tools will help us with this task:

  1. An array of structures to store the results from each model.
  2. A vector to store the AIC values from each model.
// Set maximum lags
maxlags = 6;

// Declare a single array
struct arimamtOut amo;

// Reshape to create structure array
amo = reshape(amo, maxlags, 1);

// AIC storage vector
aic_vector = zeros(maxlags, 1);

Next, we’ll loop through our models. In each iteration, we will:

  1. Store the results in a separate arimamtOut structure.
  2. Extract the AIC and store it in our AIC vector.
  3. Adjust the sample size so that each lag selection iteration uses the same number of observations.
// Loop through lag possibilities
for i(1, maxlags, 1);
    // Trim data to enforce sample
    // size consistency 
    y_i = trimr(cpi_data, maxlags-i, 0);

    // Estimate the current 
    // AR(i) model
    amo[i] = arimaSS(y_i, i);

    // Store AIC for easy comparison
    aic_vector[i] = amo[i].aic;
endfor;

Finally, we will use the minindc procedure to find the index of the minimum AIC:

// Optimal lag is equal to location
// of minimum AIC
opt_lag = minindc(aic_vector);

// Print optimal lags
print "Optimal lags:"; opt_lag;

// Select the final output structure
struct arimamtOut amo_final;
amo_final = amo[opt_lag];

The optimal lags based on the minimum AIC is 8, yielding the following results:

================================================================================
Model:                 ARIMA(8,0,0)          Dependent variable:        CPIAUCNS
Time Span:              1971-01-01:          Valid cases:                    652
                        2025-04-01
SSE: 0.803 Degrees of freedom: 642 Log Likelihood: -1258.991 RMSE: 0.035 AIC: -2537.982 SEE: 0.080 SBC: -2453.182 Durbin-Watson: 1.998 R-squared: 0.385 Rbar-squared: 0.939 ================================================================================ Coefficient Estimate Std. Err. T-Ratio Prob |>| t -------------------------------------------------------------------------------- Constant 0.03824 0.00512 7.46526 0.00000 CPIAUCNS L(1) 0.58055 0.03917 14.82047 0.00000 CPIAUCNS L(2) -0.03968 0.04730 -0.83883 0.40156 CPIAUCNS L(3) -0.01156 0.05062 -0.22833 0.81939 CPIAUCNS L(4) 0.09288 0.04151 2.23749 0.02525 CPIAUCNS L(5) 0.02322 0.04773 0.48639 0.62669 CPIAUCNS L(6) -0.06863 0.04505 -1.52333 0.12767 CPIAUCNS L(7) 0.16048 0.04038 3.97391 0.00007 CPIAUCNS L(8) -0.00313 0.02778 -0.11281 0.91018 Sigma2 CPIAUCNS 0.00123 0.00007 18.05512 0.00000 ================================================================================

Conclusion

The arimaSS function offers a streamlined approach to estimating ARIMA models in state space form, eliminating the need for manual specification of system matrices and initial values. This makes it easier to explore models, experiment with lag structures, and generate forecasts, especially for users who may not be deeply familiar with state space modeling.

Further Reading

  1. Introduction to the Fundamentals of Time Series Data and Analysis
  2. Importing FRED Data to GAUSS
  3. Understanding State-Space Models (An Inflation Example)
  4. Getting Started with Time Series in GAUSS
[/markdown]
Leave a Reply