### Introduction

In today's blog, you'll learn the basics of the vector autoregressive model. We lay the foundation for getting started with this crucial multivariate time series model and cover the important details including:

- What a VAR model is.
- Who uses VAR models.
- Basic types of VAR models.
- How to specify a VAR model.
- Estimation and forecasting with VAR models.

## What is a vector autoregressive model?

The vector autoregressive (VAR) model is a workhouse multivariate time series model that relates current observations of a variable with past observations of itself and past observations of other variables in the system.

VAR models differ from univariate autoregressive models because they allow feedback to occur between the variables in the model. For example, we could use a VAR model to show how real GDP is a function of policy rate and how policy rate is, in turn, a function of real GDP.

Advantages of VAR models | |
---|---|

✔ | A systematic but flexible approach for capturing complex real-world behavior. |

✔ | Better forecasting performance. |

✔ | Ability to capture the intertwined dynamics of time series data. |

VAR modeling is a multi-step process and a complete VAR analysis involves:

- Specifying and estimating a VAR model.
- Using inferences to check and revise the model (as needed).
- Forecasting.
- Structural analysis.

## Who uses VAR models?

VAR models are traditionally widely used in finance and econometrics because they offer a framework for accomplishing important modeling goals, including (Stock and Watson 2001):

- Data description.
- Forecasting.
- Structural inference.
- Policy analysis.

However, more recently VAR models have been gaining traction in other fields like epidemiology, medicine, and biology.

Example question | Field | Description |
---|---|---|

How are vital signs in cardiorespiratory patients dynamically related? | Medicine | A VAR system is used to model the past and current relationships between heart rate, respiratory rate, blood pressure and SpO2. |

How do risks of COVID-19 infections interact across age groups? | Epidemiology | Count data of past infections across different age groups was used to model the relationships between infection rates across those age groups. |

Is there a bi-directional relationship between personal income and personal consumption spending? | Economics | A two-equation VAR system is used to model the relationship between income and consumption over time. |

How can we model the gene expression networks? | Biology | The relationships across large networks of genes are modeled using a sparse structural VAR model. |

What is driving inflation more -- monetary policy shocks or external shocks? | Macroeconomics | A structural VAR model is used to compute variance decomposition and impulse response functions following monetary shocks and external system shocks. |

## The reduced form, recursive, and structural VAR

There are three broad types of VAR models, the reduced form, the recursive form, and the structural VAR model.

**Reduced form VAR models** consider each variable to be a function of:

- Its own past values.
- The past values of other variables in the model.

While reduced form models are the simplest of the VAR models, they do come with disadvantages:

- Contemporaneous variables are not related to one another.
- The error terms will be correlated across equations. This means we cannot consider what impacts individual shocks will have on the system.

**Recursive VAR models** contain all the components of the reduced form model, but also allow some variables to be functions of other concurrent variables. By imposing these short-run relationships, the recursive model allows us to model structural shocks.

**Structural VAR models** include restrictions that allow us to identify causal relationships beyond those that can be identified with reduced form or recursive models. These causal relationships can be used to model and forecast impacts of individual shocks, such as policy decisions

### A simple example

As an example, let's consider a VAR with three endogenous variables, the unemployment rate, the inflation rate, and interest rates.

A *reduced form* VAR(2) model of the system includes the following equations:

$$\begin{aligned}\text{UNEM}_t = \beta_{10} &+ \beta_{11}\text{UNEM}_{t-1} + \beta_{12}\text{UNEM}_{t-2}\\&+ \gamma_{11}\text{INFL}_{t-1} + \gamma_{12}\text{INFL}_{t-2} \\&+ \phi_{11}\text{R}_{t-1} + \phi_{12}\text{R}_{t-2} \\&+ \mu_{1t}\end{aligned}\\ \ \\ \begin{aligned}\text{INFL}_t = \beta_{20} &+ \beta_{21}\text{UNEM}_{t-1} + \beta_{22}\text{UNEM}_{t-2}\\ &+ \gamma_{21}\text{INFL}_{t-1} + \gamma_{22}\text{INFL}_{t-2} \\&+ \phi_{21}\text{R}_{t-1} + \phi_{22}\text{R}_{t-2} \\&+ \mu_{2t}\end{aligned}\\ \ \\ \begin{aligned}\text{R}_t = \beta_{30} &+ \beta_{31}\text{UNEM}_{t-1} + \beta_{32}\text{UNEM}_{t-2}\\ &+ \gamma_{31}\text{INFL}_{t-1} + \gamma_{32}\text{INFL}_{t-2} \\&+ \phi_{31}\text{R}_{t-1} + \phi_{32}\text{R}_{t-2} \\&+ \mu_{3t}\end{aligned}$$

A *recursive form* VAR(2) model of the system might include the following equations:

$$\begin{aligned}\text{UNEM}_t = \beta_{10} &+ \beta_{11}\text{UNEM}_{t-1} + \beta_{12}\text{UNEM}_{t-2}\\&+ \gamma_{11}\text{INFL}_{t-1} + \gamma_{12}\text{INFL}_{t-2} \\&+ \phi_{11}\text{R}_{t-1} + \phi_{12}\text{R}_{t-2} \\&+ \mu_{1t}\end{aligned}\\ \ \\ \begin{aligned}\text{INFL}_t = \beta_{20} &+ \delta_{21}\text{UNEM}_{t} + \beta_{21}\text{UNEM}_{t-1} + \beta_{22}\text{UNEM}_{t-2}\\ &+ \gamma_{21}\text{INFL}_{t-1} + \gamma_{22}\text{INFL}_{t-2} \\&+ \phi_{21}\text{R}_{t-1} + \phi_{22}\text{R}_{t-2} \\&+ \mu_{2t}\end{aligned}\\ \ \\ \begin{aligned}\text{R}_t = \beta_{30} &+ \delta_{21}\text{UNEM}_{t} + \beta_{31}\text{UNEM}_{t-1} + \beta_{32}\text{UNEM}_{t-2}\\ &+ \delta_{31}\text{INFL}_{t} +\gamma_{31}\text{INFL}_{t-1} + \gamma_{32}\text{INFL}_{t-2} \\&+ \phi_{31}\text{R}_{t-1} + \phi_{32}\text{R}_{t-2} \\&+ \mu_{3t}\end{aligned}$$

To estimate the structural VAR model of the system, we have to put restrictions on our model. For example, we may assume that the Fed follows the inflation targeting rule for setting interest rates. This assumption would be built into our system as the equation for interest rates.

## Specifying a VAR model

### What makes up a VAR model?

A VAR model is made up of a system of equations that represents the relationships between multiple variables. When referring to VAR models, we often use special language to specify:

- How many endogenous variables there are included.
- How many autoregressive terms are included.

For example, if we have two endogenous variables and autoregressive terms, we say the model is a **Bivariate VAR(2)** model. If we have three endogenous variables and four autoregressive terms, we say the model is a **Trivariate VAR(4)** model.

In general, a VAR model is composed of *n-equations* (representing *n* endogenous variables) and includes *p-lags* of the variables.

### How do we choose the number of lags in a VAR model?

Lag selection is one of the important aspects of VAR model specification. In practical applications, we generally choose a maximum number of lags, $p_{max}$, and evaluate the performance of the model including $p = 0, 1, \ldots, p_{max}$.

The optimal model is then the model VAR(p) which minimizes some lag selection criteria. The most commonly used lag selection criteria are:

- Akaike (AIC)
- Schwarz-Bayesian (BIC)
- Hannan-Quinn (HQ).

These methods are usually built into software and lag selection is almost completely automated now.

### How do we decide what endogenous variables to include in our VAR model?

From an estimation standpoint, it is important to be deliberate about how many variables we include in our VAR model. Adding additional variables:

- Increases the number of coefficients to be estimated for each equation and each number of lags.
- Introduce additional estimation error.

Deciding what variables to include in a VAR model should be founded in theory, as much as possible. We can use additional tools, like Granger causality or Sims causality, to test the forecasting relevance of variables.

## Estimating and inference in VAR models

Despite their seeming complexities, VAR models are quite easy to estimate. The equation can be estimated using ordinary least squares given a few assumptions:

- The error term has a conditional mean of zero.
- The variables in the model are stationary.
- Large outliers are unlikely.
- No perfect multicollinearity.

Under these assumptions, the ordinary least squares estimates:

- Will be consistent.
- Can be evaluated using traditional t-statistics and p-values.
- Can be used to jointly test restrictions across multiple equations.

## Forecasting

One of the most important functions of VAR models is to generate forecasts. Forecasts are generated for VAR models using an iterative forecasting algorithm:

- Estimate the VAR model using OLS for each equation.
- Compute the one-period-ahead forecast for all variables.
- Compute the two-period-ahead forecasts, using the one-period-ahead forecast.
- Iterate until the h-step ahead forecasts are computed.

## Reporting and evaluating VAR models

Often we are more interested in the dynamics that are predicted by our VAR models than the actual coefficients that are estimated. For this reason, it is most common that VAR studies report:

- Granger-causality statistics.
- Impulse response functions.
- Forecast error decompositions

### Granger-causality statistics

As we previously discussed, Granger-causality statistics test whether one variable is statistically significant when predicting another variable.

The Granger-causality statistics are F-statistics that test if the coefficients of all lags of a variable are jointly equal to zero in the equation for another variable. As the p-value of the F-statistic decreases, evidence that a variable is relevant for predict another variable increases.

For example, in the Granger-causality test of $X$ on $Y$, if the p-value is 0.02 we would say that $X$ does help predict $Y$ at the 5% level. However, if the p-value is 0.3 we would say that there is no evidence that $X$ helps predict $Y$.

### Impulse response functions

The impulse response function traces the dynamic path of variables in the system to shocks to other variables in the system. This is done by:

- Estimating the VAR model.
- Implementing a one-unit increase in the error of one of the variables in the model, while holding the other errors equal to zero.
- Predicting the impacts h-period ahead of the error shock.
- Plotting the forecasted impacts, along with the one-standard-deviation confidence intervals.

### Forecast error decomposition

Forecast error decomposition separates the forecast error variance into proportions attributed to each variable in the model.

Intuitively, this measure helps us judge how much of an impact one variable has on another variable in the VAR model and how intertwined our variables' dynamics are.

For example, if $X$ is responsible for 85% of the forecast error variance of $Y$, it is explaining a large amount of the forecast variation in $X$. However, if $X$ is only responsible for 20% of the forecast error variance of $Y$, much of the forecast error variance of $Y$ is left *unexplained* by $X$.

### Conclusion

VAR models are an essential component of multivariate time series modeling. After today's blog, you should have a better understanding of the fundamentals of the VAR model including:

- What a VAR model is.
- Who uses VAR models.
- Basic types of VAR models.
- How to specify a VAR model.
- Estimation and forecasting with VAR models.

### Further Reading

- Introduction to the Fundamentals of Time Series Data and Analysis
- The Intuition Behind Impulse Response Functions and Forecast Error Variance Decomposition
- Introduction to Granger Causality
Erica has been working to build, distribute, and strengthen the GAUSS universe since 2012. She is an economist skilled in data analysis and software development. She has earned a B.A. and MSc in economics and engineering and has over 15 years combined industry and academic experience in data analysis and research.