FRED provides extensive access to data. However, that data must be pre-processed and properly cleaned before usage. This can be a time-consuming and tedious part of data analysis.
In today’s blog, we look at how to save time and reduce errors using GAUSS’s new data management tools.
Using the quarterly real GDP dataset from the FRED database we explore GAUSS’s new data management tools.
In particular, we examine how to:
- Deal with irregular dataset headers.
- Change variable names.
- Filter dates and change the date display.
Simple Data Import Example
The real GDP series from the FRED database was downloaded and is stored in the file
fred_rgdp.xlsx. To start, we open our dataset in the Data Import window by double-clicking on the filename,
fred_rgdp.xlsx in the Project Folders tab.
This opens the Data Import tool:
The data preview window indicates that there are a number of issues to address in our raw dataset:
- The raw data includes an irregular header.
- The variable names are not correct.
Updating the header row
We can see in our data preview that more appropriate variable names are located row 11. This can be specified using the Header Row text box on the Import Options tab:
The variable names are automatically updated in the preview to reflect the names contained in row 11. Also notice that the Row Range input is automatically updated to 12, one past the header row.
The FRED real GDP data is now ready for import and we can select Import to bring the data into the GAUSS workspace.
Note that when we do, the GAUSS code used to import the data is auto-generated and can be used to replicate the interactive steps we performed.
The auto-generated code can be accessed a few different ways:
- The command history. To send the code from the command history to a file, right-click on the command in the History list and select Send to File.
- The Program Input/Output window. The code snippet can be directly copied by selecting the code snippet, right-clicking, and selecting Copy.
Cleaning Data in Memory
Suppose that after importing the real GDP data, we decide we want to perform some data cleaning steps.
Specifically, let’s consider performing the following:
- Changing the name of the
- Changing the display format of the
- Filtering by date to only include observations from 1970 or later.
To begin, double-click fred_rgdp in the Symbols window on the left to open it in the symbol editor.
Changing a variable name
To change a variable name, click the triangle next to the
GCPC1 variable name to open the Variable settings menu and select Rename:
real_gdp now shows up in the Variables tab in red. The red text indicates that there are unsaved changes. To save changes we must click Apply.
Changing the date display format
The observation dates for our quarterly data, contained in
observation_date, are currently displayed in the ISO-8601 (“Year-Month-Day”) format (
1947-01-01). Since we are working with quarterly data, let’s update this to display the year and quarter,
This can be done by expanding the Variable settings menu for
observation_date and selecting Properties. This opens the Specify Date Format dialog:
We enter our new date display format,
%Y-Q%q, in the Date Display text box and click OK.
The data editor preview now shows:
The asterisk on the matrix name tab indicates that our dataframe fred_rgdp has unsaved changes.
To finish our real GDP FRED data cleaning example, let's filter the dataframe to include only observations that occur in 1970 and later.
To do this, select the Filter tab and set:
- The Variable equal to
observation_date(this is the default).
- The Operation to
- The Value equal to
After setting all three parts of the filter, we click the
+ button to add the filter. When we do this, the data preview updates to indicate which values will be included after hitting Apply:
The first, and least glamorous, step in successful data modeling is data cleaning -- the quality of any model depends on the quality of the data coming in.
Today, we’ve examined how the interactive data management tools in GAUSS can make this process less painful and less tedious.
Specifically, we’ve seen how to interactively:
- Remove irregular headers.
- Change variable names.
- Filter by date.
- Change a date display format.
- For a complete guide to the GAUSS data management, check out our online Data Management Guide.
- The blog, Easy and Fast Data Management in GAUSS 21 provides an overview of all the new GAUSS 21 data management tools.
- For a video example of the data management tools, see our latest YouTube video, Quick and Easy Interactive Data Cleaning