## Introduction

The `aggregate`

function, first available in GAUSS version 20, computes statistics within data groups. This is particularly useful for panel data.

In today's blog, we take a closer look at `aggregate`

. We will:

- Introduce the basics of the
`aggregate`

function. - Explain how to use the
`aggregate`

function - Demonstrate a real-world application of the
`aggregate`

function using current account data from the International Monetary Fund.

## The GAUSS Aggregate Function

The GAUSS `aggregate`

function computes statistics within a group based upon a specified group identifier. The function supports a variety of GAUSS statistics including:

- mean
- median
- mode
- min
- max
- sample standard deviation
- sum
- sample variance

For example, consider a panel dataset which includes observed weights for three individuals across a 6-month time span:

Name | Jan. Weight | Feb. Weight | Mar. Weight | Apr. Weight | May Weight | June Weight |
---|---|---|---|---|---|---|

Sarah | 135 | 134 | 138 | 142 | 144 | 145 |

Tom | 196 | 192 | 182 | 183 | 184 | 181 |

Nikki | 143 | 144 | 146 | 147 | 145 | 143 |

We can use the `aggregate`

function to find the 6-month mean weights for Sarah, Tom and Nikki:

Name | Jan. Weight | Feb. Weight | Mar. Weight | Apr. Weight | May Weight | June Weight | Mean Weight |
---|---|---|---|---|---|---|---|

Sarah | 135 | 134 | 138 | 142 | 144 | 145 | 139.7 |

Tom | 196 | 192 | 182 | 183 | 184 | 181 | 186.3 |

Nikki | 143 | 144 | 146 | 147 | 145 | 143 | 144.7 |

Alternatively, we could find the monthly standard deviation of the weights across Sarah, Tom and Nikki:

Name | Jan. Weight | Feb. Weight | Mar. Weight | Apr. Weight | May Weight | June Weight |
---|---|---|---|---|---|---|

Sarah | 135 | 134 | 138 | 142 | 144 | 145 |

Tom | 196 | 192 | 182 | 183 | 184 | 181 |

Nikki | 143 | 144 | 146 | 147 | 145 | 143 |

Monthly Std. Dev. | 33.2 | 31.0 | 23.4 | 22.4 | 22.8 | 21.4 |

## How to Use The Aggregate Function

The `aggregate`

function takes two required inputs:

`x_a = aggregate(x, method);`

- x
- NxK data matrix, must have group identifiers in the first column.
- method
- String, method to use. Valid methods include:
`"mean"`

,`"median"`

,`"mode"`

,`"max"`

,`"min"`

,`"sd"`

,`"sum"`

,`"variance"`

.

### The Input Data Matrix

The `aggregate`

function requires the data matrix input to:

- Have numerical group identifiers in the first column.
- Be in stacked panel data format.

Let's consider our example dataset from above. In order to use this data as an input to the GAUSS `aggregate`

function we need to:

- Recode our group identifiers from names to numbers.
- Stack our data into a pooled dataset.

Name | Jan. Weight | Feb. Weight | Mar. Weight | Apr. Weight | May Weight | June Weight |
---|---|---|---|---|---|---|

Sarah | 135 | 134 | 138 | 142 | 144 | 145 |

Tom | 196 | 192 | 182 | 183 | 184 | 181 |

Nikki | 143 | 144 | 146 | 147 | 145 | 143 |

$$\text{Sarah} \rightarrow 1$$ $$\text{Tom} \rightarrow 2$$ $$\text{Nikki} \rightarrow 3$$

$$\Downarrow$$

Group | Jan. Weight | Feb. Weight | Mar. Weight | Apr. Weight | May Weight | June Weight |
---|---|---|---|---|---|---|

1 | 135 | 134 | 138 | 142 | 144 | 145 |

2 | 196 | 192 | 182 | 183 | 184 | 181 |

3 | 143 | 144 | 146 | 147 | 145 | 143 |

$$\Downarrow$$

Group | Month | Weight |
---|---|---|

1 | 1 | 135 |

1 | 2 | 134 |

1 | 3 | 138 |

1 | 4 | 142 |

1 | 5 | 144 |

1 | 6 | 145 |

2 | 1 | 196 |

2 | 2 | 192 |

⁞ | ⁞ | ⁞ |

3 | 6 | 143 |

### The Method Input

The *method* input into the `aggregate`

function should always be a string indicating which statistic you wish to compute.

Each method works on groups within the panel the same way the analogous pooled data function would work, including its handling of missing values.

Method | Pooled Function |
---|---|

mean | `meanc` |

median | `median` |

mode | `modec` |

max | `maxc` |

min | `minc` |

sum | `sumc` |

sd | `stdc` |

variance | `varCovXS` |

### Example of How to Use Aggregate

Let's use `aggregate`

to find the means by group for weight data:

```
weights = { 1 1 135,
1 2 134,
1 3 138,
1 4 142,
1 5 144,
1 6 145,
2 1 196,
2 2 192,
2 3 182,
2 4 183,
2 5 184,
2 6 181,
3 1 143,
3 2 144,
3 3 146,
3 4 147,
3 5 145,
3 6 143 };
/*
** Find the mean by person.
** We will use the first column
** as the group indicator and will find
** the mean of the weights.
*/
print aggregate(weights[., 1 3], "mean");
```

This prints the group means to the output window:

```
1 139.67
2 186.33
3 144.67
```

We can also use the month identifiers to find the sample standard deviation by month:

```
/*
** Find the standard deviation by month.
** We will use the second column of weights
** as the group indicator and will find
** the standard deviation of the weights.
*/
print aggregate(weights[., 2 3], "sd");
```

Now the standard deviations, along with their associated months will be printed to the output window:

```
1 33.151
2 31.005
3 23.438
4 22.368
5 22.811
6 21.385
```

## Using Aggregate to Examine Trends in Current Account Balances

Our simple example dataset is useful for demonstrating the basics of the `aggregate`

function. However, a real-world panel dataset better demonstrates its true power. In this section, we will use `aggregate`

to examine some of the trends in international current account balances.

### The Data

We will use current account balance measured as a percentage of GDP. This unbalanced panel data is a modified version of a dataset from the International Monetary Fund and spans 1953-Q3 to 2019-Q4 and includes a total of 46 countries, across 5 different regions.

It contains the following variables:

Variable name | Description |
---|---|

Country | String, name of the country. |

Country ID | Integer country identifier. |

World Region | String, name of the corresponding world region. |

Region ID | Integer world region identifier. |

Time | String, the date of the observation. |

CAB | Decimal numeric, the Current Account Balance. |

### Mean and Median Current Account Balances By Country

We first examine variations in the mean and median current account balances across countries. Using `aggregate`

we find the mean and median current account balances for each country in the panel across all observations.

```
/*
** Load the 'Country ID' and 'CAB' (Current Account Balance) variables.
** Notice that the grouping variable will be in the first column of 'X'.
*/
X = loadd("imf_cab_mod.xlsx", "Country Id + CAB");
// Compute mean and median current
// account balances by Country ID
mean_cab_cid = aggregate(X, "mean");
median_cab_cid = aggregate(X, "median");
```

After the above code `mean_cab_cid`

and `median_cab_cid`

will be both be $46\times2$ matrices. Each element in the first column will be a unique country ID. The corresponding element in the second column will be the average (mean or median) current account balance for that country.

We include a graph of this data below, where we see that Germany leads the pack with the highest average current account balance while Finland has the lowest average current account balances.

### Mean and Median Current Account Balances By Region

We can similarly consider the mean and median current account balances across geographical regions with the code below.

```
/*
** Load the 'Region ID' and 'CAB' (Current Account Balance) variables.
** Notice that the grouping variable will be in the first column of 'X'.
*/
X = loadd("imf_cab_mod.xlsx", "Region ID + CAB");
// Compute mean and median current
// account balances by world region
mean_cab_wreg = aggregate(X, "mean");
median_cab_wreg = aggregate(X, "median");
```

`mean_cab_wreg`

and `median_cab_wreg`

will be two column matrices with unique world region IDs in the first column and the corresponding statistics in the second column.

### Mean and Median Current Account Balances Time Series

Finally, we consider how the mean and median current account balances vary across time in the time series plot below.

```
/*
** Load the 'Time' and 'CAB' (Current Account Balance) variables.
** Notice that the grouping variable will be in the first column of 'X'.
** Wrapping 'Time' in 'date($)' tells GAUSS that 'Time' is a string
** variable that we want GAUSS to convert to a date.
*/
X = loadd("imf_cab_mod.xlsx", "date($Time) + CAB");
mean_cab_date = aggregate(X, "mean");
median_cab_date = aggregate(X, "median");
```

This time the first column of our resulting matrices, `mean_cab_date`

and `median_cab_date`

, will contain each unique date from our dataset. The second column will contain the statistic computed for each unique date.

Check out this tutorial to learn more.

Below is a graph of the Current Account Balance data grouped by quarter.

## Conclusion

In today's blog, we examined the fundaments of the `aggregate`

procedure. After reading you should have a better understanding of:

- The basics of the
`aggregate`

function. - How to use the
`aggregate`

function. - How to examine trends in real-world panel data using
`aggregate`

.

### Further Reading

- Panel data, structural breaks and unit root testing
- Panel Data Basics: One-way Individual Effects
- Introduction to the Fundamentals of Panel Data
- Panel Data Stationarity Test With Structural Breaks
- Transforming Panel Data to Long Form in GAUSS

Eric has been working to build, distribute, and strengthen the GAUSS universe since 2012. He is an economist skilled in data analysis and software development. He has earned a B.A. and MSc in economics and engineering and has over 18 years of combined industry and academic experience in data analysis and research.