Easier ARIMA Modeling with State Space: Revisiting Inflation Modeling Using TSMT 4.0

by Eric · Published June 2, 2025 · Updated June 18, 2025

Introduction

State space models are a powerful tool for analyzing time series data, especially when you want to estimate unobserved components like trends or cycles. But traditionally, setting up these models—even for something as common as ARIMA—can be tedious.

The GAUSS arimaSS function, available in the Time Series MT 4.0 library, lets you estimate state space ARIMA models without manually building the full state space structure. It’s a cleaner, faster, and more reliable way to work with ARIMA models.

In this post, we’ll revisit our inflation modeling example using updated data from the Federal Reserve Economic Data (FRED) database. Along the way, we’ll demonstrate how arimaSS works, how it simplifies the modeling process, and how easy it is to generate forecasts from your results.

Why use `arimaSS` in TSMT?

In our earlier state-space inflation example, we manually set up the state space model. This process required a solid understanding of state space modeling, specifically:

Setting up the system matrices.
Initializing state vectors.
Managing model dynamics.
Specifying parameter starting values.

In comparison, the arimaSS function handles all of this setup automatically. It internally constructs the appropriate model structure and runs the Kalman filter using standard ARIMA specifications.

Overall, the arimaSS function provides:

Simplified syntax: No need to manually define matrices or system dynamics. This not only saves time but also reduces the chance of errors or model misspecification.
More robust estimates: Behind-the-scenes improvements, such as enhanced covariance computations and stationarity enforcement, lead to more accurate and stable parameter estimates.
Compatibility with forecasting tools: The arimaSS output structure integrates directly with TSMT tools for computing and plotting forecasts.

The `arimaSS` Procedure

The arimaSS procedure has two required inputs:

A time series dataset.
The AR order.

It also allows four optional inputs for model customization:

The order of differencing.
The moving average order.
An indicator controlling whether a constant is included in the model.
An indicator controlling whether a trend is included in the the model.

General Usage

aOut = arimaSS(y, p [, d, q, trend, const]);

Y: Tx1 or Tx2 time series data. May include date variable, which will be removed from the data matrix and is not included in the model as a regressor.
p: Scalar, the number of autoregressive lags included in the model.
d: Optional, scalar, the order of differencing. Default = 0.
q: Optional, scalar, the moving average order. Default = 0.
trend: Optional, scalar, an indicator variable to include a trend in the model. Set to 1 to include trend, 0 otherwise. Default = 0.
const: Optional, an indicator variable to include a constant in the model. Set to 1 to include constant, 0 otherwise. Default = 1.

All returns are stored in an arimaOut structure, including:

Estimated parameters.
Model diagnostics and summary statistics.
Model description.

The complete contents of the arimaOut structure include:

Member	Description
`amo.aic`	Akaike Information Criterion value.
`amo.b`	Estimated model coefficients (Kx1 vector).
`amo.e`	Residuals from the fitted model (Nx1 vector).
`amo.ll`	Log-likelihood value of the model.
`amo.sbc`	Schwarz Bayesian Criterion value.
`amo.lrs`	Likelihood Ratio Statistic vector (Lx1).
`amo.vcb`	Covariance matrix of estimated coefficients (KxK).
`amo.mse`	Mean squared error of the residuals.
`amo.sse`	Sum of squared errors.
`amo.ssy`	Total sum of squares of the dependent variable.
`amo.rstl`	Instance of `kalmanResult` structure containing Kalman filter results.
`amo.tsmtDesc`	Instance of `tsmtModelDesc` structure with model description details.
`amo.sumStats`	Instance of `tsmtSummaryStats` structure containing summary statistics.

Example: Modeling Inflation

Today, we’ll use a simple, albeit naive, model of inflation. This model is based on a CPI inflation index created from the FRED CPIAUCNS monthly dataset.

To begin, we’ll load and prepare our data directly from the FRED database.

Loading data from FRED

Using the fred_load and fred_set procedures, we will:

Pull the continuously compounded annual rate of change from FRED.
Include data starting from January 1971 (1971m1).

// Set observation start date
fred_params = fred_set("observation_start", "1971-01-01");

// Specify units to be 
// continuous compounded annual 
// rate of change
fred_params = fred_set("units", "cca");

// Specify series to pull
series = "CPIAUCNS";

// Pull data from FRED
cpi_data = fred_load(series, fred_params);

// Preview data
head(cpi_data);

This prints the first five observations:

            date         CPIAUCNS
      1971-01-01        0.0000000
      1971-02-01        3.0112900
      1971-03-01        3.0037600
      1971-04-01        2.9962600
      1971-05-01        5.9701600

To further preview our data, let's create a quick plot of the inflation series using the plotXY procedure and a formula string:

plotXY(cpi_data, "CPIAUCNS~date");

For fun, let’s add a reference line to visualize the Fed’s long-run average inflation target of 2%:

// Add inflation target line at 2%
plotAddHLine(2);

As one final visualization, let's look at the 5 year (60 month) moving average line:

// Compute moving average
ma_5yr = movingAve(cpi_data[., "CPIAUCNS"], 60);

// Add to time series plot
plotXY(cpi_data[., "date"], ma_5yr);

// Add inflation targetting line at 2%
plotAddHLine(2);

The moving average plot highlights long-term trends, filtering out short-term fluctuations and noise:

The Disinflation Era: (app. 1980-1993): This period is marked by the steep decline in inflation from the double-digit highs of the early 1980s to around 3% by the early 1990s, an outcome of aggressive monetary policy by the Federal Reserve.
The ‘Great Moderation’ (mid-1990s- mid-2000s): Inflation remained relatively stable and low, hovering near the Fed's 2% target, marked here with a horizontal line for reference.
Post-GFC stagnation (2008-2020): After the 2008 Global Financial Crisis, inflation trended even lower, with the 5-year average dipping below 2% for an extended period, reflecting sluggish demand and persistent slack.
Recent surge: The sharp rise beginning around 2021 reflects the post-pandemic spike in inflation, pushing the 5-year average above 3% for the first time in over a decade.

We’ll make one final transformation before estimation by converting the "CPIAUCNS" values from percentages to decimals.

cpi_data[., "CPIAUCNS"] = cpi_data[., "CPIAUCNS"]/100;

Note: The fred_load procedure requires a valid API key. To download data directly from FRED into GAUSS, you must obtain an API key from FRED and set it in GAUSS.For more details on importing data from FRED, see our earlier blog post, Importing FRED Data to GAUSS.

ARIMA Estimation

Now that we’ve loaded our data, we’re ready to estimate our model using arimaSS. We’ll start with a simple AR(2) model. Based on the earlier visualization, it’s reasonable to include a constant but exclude a trend, so we’ll use the default settings for those options.

call arimaSS(cpi_data, 2);

There are a few helpful things to note about this:

We did not need to remove the date vector from cpi_data before passing it to arimaSS. Most TSMT functions allow you to include a date vector with your time series. In fact, this is recommended, GAUSS will automatically detect and use the date vector to generate more informative results reports.
In this example, we are not storing the output. Instead, we are printing it directly to the screen using the call keyword.
Because this is strictly an AR model and we’re using the default deterministic components, we only need two inputs: the data and the AR order.

A detailed results report is printed to screen:

================================================================================
Model:                 ARIMA(2,0,0)          Dependent variable:        CPIAUCNS
Time Span:              1971-01-01:          Valid cases:                    652
                        2025-04-01

SSE:                          0.839          Degrees of freedom:             648
Log Likelihood:           -1244.565          RMSE:                         0.036
AIC:                      -2497.130          SEE:                          0.210
SBC:                      -2463.210          Durbin-Watson:                1.999
R-squared:                    0.358          Rbar-squared:                 0.839
================================================================================
Coefficient                Estimate      Std. Err.        T-Ratio     Prob |>| t
--------------------------------------------------------------------------------

Constant                    0.03832        0.00349       10.97118        0.00000
CPIAUCNS L(1)               0.59599        0.03715       16.04180        0.00000
CPIAUCNS L(2)               0.00287        0.03291        0.08726        0.93046
Sigma2 CPIAUCNS             0.00129        0.00007       18.05493        0.00000
================================================================================

There are some interesting observations from our results:

The estimated constant is statistically significant and equal to 0.038 (3.8%). This is higher than the Fed’s long-run inflation target of 2%, but not by much. It’s also important to note that our dataset begins well before the era of formal Fed inflation targeting.
All coefficients are statistically significant except for the CPIAUCNS L(2) coefficient.
The table header includes the timespan of our data. This was automatically detected because we included a date vector with our input. If no date vector is included, the timespan will be reported as unknown.

Extra credit: Looping For Model Selection

The arimaSS procedure doesn’t currently provide built-in optimal lag selection. However, we can write a simple for loop and use an array of structures to identify the best lag length.

Our goal is to select the model with the lowest AIC, allowing for a maximum of 6 lags.

Two tools will help us with this task:

An array of structures to store the results from each model.
A vector to store the AIC values from each model.

// Set maximum lags
maxlags = 6;

// Declare a single array
struct arimamtOut amo;

// Reshape to create structure array
amo = reshape(amo, maxlags, 1);

// AIC storage vector
aic_vector = zeros(maxlags, 1);

Next, we’ll loop through our models. In each iteration, we will:

Store the results in a separate arimamtOut structure.
Extract the AIC and store it in our AIC vector.
Adjust the sample size so that each lag selection iteration uses the same number of observations.

// Loop through lag possibilities
for i(1, maxlags, 1);
    // Trim data to enforce sample
    // size consistency 
    y_i = trimr(cpi_data, maxlags-i, 0);

    // Estimate the current 
    // AR(i) model
    amo[i] = arimaSS(y_i, i);

    // Store AIC for easy comparison
    aic_vector[i] = amo[i].aic;
endfor;

Finally, we will use the minindc procedure to find the index of the minimum AIC:

// Optimal lag is equal to location
// of minimum AIC
opt_lag = minindc(aic_vector);

// Print optimal lags
print "Optimal lags:"; opt_lag;

// Select the final output structure
struct arimamtOut amo_final;
amo_final = amo[opt_lag];

The optimal lags based on the minimum AIC is 8, yielding the following results:

================================================================================
Model:                 ARIMA(8,0,0)          Dependent variable:        CPIAUCNS
Time Span:              1971-01-01:          Valid cases:                    652
                        2025-04-01

SSE:                          0.803          Degrees of freedom:             642
Log Likelihood:           -1258.991          RMSE:                         0.035
AIC:                      -2537.982          SEE:                          0.080
SBC:                      -2453.182          Durbin-Watson:                1.998
R-squared:                    0.385          Rbar-squared:                 0.939
================================================================================
Coefficient                Estimate      Std. Err.        T-Ratio     Prob |>| t
--------------------------------------------------------------------------------

Constant                    0.03824        0.00512        7.46526        0.00000
CPIAUCNS L(1)               0.58055        0.03917       14.82047        0.00000
CPIAUCNS L(2)              -0.03968        0.04730       -0.83883        0.40156
CPIAUCNS L(3)              -0.01156        0.05062       -0.22833        0.81939
CPIAUCNS L(4)               0.09288        0.04151        2.23749        0.02525
CPIAUCNS L(5)               0.02322        0.04773        0.48639        0.62669
CPIAUCNS L(6)              -0.06863        0.04505       -1.52333        0.12767
CPIAUCNS L(7)               0.16048        0.04038        3.97391        0.00007
CPIAUCNS L(8)              -0.00313        0.02778       -0.11281        0.91018
Sigma2 CPIAUCNS             0.00123        0.00007       18.05512        0.00000
================================================================================

It is worth noting that only the coefficients for the 1st, 4th, and 7th lags are statistically significant. This suggests that a model including only those lags may be more appropriate.

Conclusion

The arimaSS function offers a streamlined approach to estimating ARIMA models in state space form, eliminating the need for manual specification of system matrices and initial values. This makes it easier to explore models, experiment with lag structures, and generate forecasts, especially for users who may not be deeply familiar with state space modeling.