How to Aggregate Panel Data in GAUSS

Introduction

The aggregate function, first available in GAUSS version 20, computes statistics within data groups. This is particularly useful for panel data.

In today's blog, we take a closer look at aggregate. We will:

  1. Introduce the basics of the aggregate function.
  2. Explain how to use the aggregate function
  3. Demonstrate a real-world application of the aggregate function using current account data from the International Monetary Fund.

The GAUSS Aggregate Function

The GAUSS aggregate function computes statistics within a group based upon a specified group identifier. The function supports a variety of GAUSS statistics including:

  • mean
  • median
  • mode
  • min
  • max
  • sample standard deviation
  • sum
  • sample variance

For example, consider a panel dataset which includes observed weights for three individuals across a 6-month time span:

Name Jan. Weight Feb. Weight Mar. Weight Apr. Weight May Weight June Weight
Sarah 135 134 138 142 144 145
Tom 196 192 182 183 184 181
Nikki 143 144 146 147 145 143

We can use the aggregate function to find the 6-month mean weights for Sarah, Tom and Nikki:

Name Jan. Weight Feb. Weight Mar. Weight Apr. Weight May Weight June Weight Mean Weight
Sarah 135 134 138 142 144 145 139.7
Tom 196 192 182 183 184 181 186.3
Nikki 143 144 146 147 145 143 144.7

Alternatively, we could find the monthly standard deviation of the weights across Sarah, Tom and Nikki:

Name Jan. Weight Feb. Weight Mar. Weight Apr. Weight May Weight June Weight
Sarah 135 134 138 142 144 145
Tom 196 192 182 183 184 181
Nikki 143 144 146 147 145 143
Monthly Std. Dev. 33.2 31.0 23.4 22.4 22.8 21.4

How to Use The Aggregate Function

The aggregate function takes two required inputs:

x_a = aggregate(x, method);

x
NxK data matrix, must have group identifiers in the first column.
method
String, method to use. Valid methods include: "mean", "median", "mode", "max", "min", "sd", "sum", "variance".

The Input Data Matrix

The aggregate function requires the data matrix input to:

  • Have numerical group identifiers in the first column.
  • Be in stacked panel data format.

Let's consider our example dataset from above. In order to use this data as an input to the GAUSS aggregate function we need to:

  • Recode our group identifiers from names to numbers.
  • Stack our data into a pooled dataset.
Name Jan. Weight Feb. Weight Mar. Weight Apr. Weight May Weight June Weight
Sarah 135 134 138 142 144 145
Tom 196 192 182 183 184 181
Nikki 143 144 146 147 145 143

$$\text{Sarah} \rightarrow 1$$ $$\text{Tom} \rightarrow 2$$ $$\text{Nikki} \rightarrow 3$$

$$\Downarrow$$

Group Jan. Weight Feb. Weight Mar. Weight Apr. Weight May Weight June Weight
1 135 134 138 142 144 145
2 196 192 182 183 184 181
3 143 144 146 147 145 143

$$\Downarrow$$

Group Month Weight
1 1 135
1 2 134
1 3 138
1 4 142
1 5 144
1 6 145
2 1 196
2 2 192
3 6 143

The Method Input

The method input into the aggregate function should always be a string indicating which statistic you wish to compute.

Each method works on groups within the panel the same way the analogous pooled data function would work, including its handling of missing values.

Method Pooled Function
mean meanc
median median
mode modec
max maxc
min minc
sum sumc
sd stdc
variance varCovXS

Example of How to Use Aggregate

Let's use aggregate to find the means by group for weight data:

weights = { 1 1 135,
            1 2 134,
            1 3 138,
            1 4 142,
            1 5 144,
            1 6 145,
            2 1 196,
            2 2 192,
            2 3 182,
            2 4 183,
            2 5 184,
            2 6 181,
            3 1 143,
            3 2 144,
            3 3 146,
            3 4 147,
            3 5 145,
            3 6 143 };

/*
** Find the mean by person.
** We will use the first column
** as the group indicator and will find
** the mean of the weights.
*/
print aggregate(weights[., 1 3], "mean");

This prints the group means to the output window:

1   139.67 
2   186.33 
3   144.67

We can also use the month identifiers to find the sample standard deviation by month:

/*
** Find the standard deviation by month.
** We will use the second column of weights
** as the group indicator and will find
** the standard deviation of the weights.
*/
print aggregate(weights[., 2 3], "sd");

Now the standard deviations, along with their associated months will be printed to the output window:

1   33.151 
2   31.005 
3   23.438 
4   22.368 
5   22.811 
6   21.385

Our simple example dataset is useful for demonstrating the basics of the aggregate function. However, a real-world panel dataset better demonstrates its true power. In this section, we will use aggregate to examine some of the trends in international current account balances.

The Data

We will use current account balance measured as a percentage of GDP. This unbalanced panel data is a modified version of a dataset from the International Monetary Fund and spans 1953-Q3 to 2019-Q4 and includes a total of 46 countries, across 5 different regions.

It contains the following variables:

Variable name Description
Country String, name of the country.
Country ID Integer country identifier.
World Region String, name of the corresponding world region.
Region ID Integer world region identifier.
Time String, the date of the observation.
CAB Decimal numeric, the Current Account Balance.

Mean and Median Current Account Balances By Country

We first examine variations in the mean and median current account balances across countries. Using aggregate we find the mean and median current account balances for each country in the panel across all observations.

/*
** Load the 'Country ID' and 'CAB' (Current Account Balance) variables.
** Notice that the grouping variable will be in the first column of 'X'.
*/
X = loadd("imf_cab_mod.xlsx", "Country Id + CAB");

// Compute mean and median current
// account balances by Country ID
mean_cab_cid = aggregate(X, "mean");
median_cab_cid = aggregate(X, "median");

After the above code mean_cab_cid and median_cab_cid will be both be $46\times2$ matrices. Each element in the first column will be a unique country ID. The corresponding element in the second column will be the average (mean or median) current account balance for that country.

We include a graph of this data below, where we see that Germany leads the pack with the highest average current account balance while Finland has the lowest average current account balances.

Mean and Median Current Account Balances By Region

We can similarly consider the mean and median current account balances across geographical regions with the code below.

/*
** Load the 'Region ID' and 'CAB' (Current Account Balance) variables.
** Notice that the grouping variable will be in the first column of 'X'.
*/
X = loadd("imf_cab_mod.xlsx", "Region ID + CAB");

// Compute mean and median current
// account balances by world region
mean_cab_wreg = aggregate(X, "mean");
median_cab_wreg = aggregate(X, "median");

mean_cab_wreg and median_cab_wreg will be two column matrices with unique world region IDs in the first column and the corresponding statistics in the second column.

Mean and Median Current Account Balances Time Series

Finally, we consider how the mean and median current account balances vary across time in the time series plot below.

/*
** Load the 'Time' and 'CAB' (Current Account Balance) variables.
** Notice that the grouping variable will be in the first column of 'X'.
** Wrapping 'Time' in 'date($)' tells GAUSS that 'Time' is a string
** variable that we want GAUSS to convert to a date.
*/
X = loadd("imf_cab_mod.xlsx", "date($Time) + CAB");

mean_cab_date = aggregate(X, "mean");
median_cab_date = aggregate(X, "median");

This time the first column of our resulting matrices, mean_cab_date and median_cab_date, will contain each unique date from our dataset. The second column will contain the statistic computed for each unique date.


Interested in learning more about loading dates in GAUSS?
Check out this tutorial to learn more.

Below is a graph of the Current Account Balance data grouped by quarter.

Conclusion

In today's blog, we examined the fundaments of the aggregate procedure. After reading you should have a better understanding of:

  1. The basics of the aggregate function.
  2. How to use the aggregate function.
  3. How to examine trends in real-world panel data using aggregate.

Was this post helpful?

Leave a Reply

Have a Specific Question?

Get a real answer from a real person

Need Support?

Get help from our friendly experts.

Try GAUSS for 30 days for FREE

See what GAUSS can do for your data

© Aptech Systems, Inc. All rights reserved.

Privacy Policy