Visualizing COVID-19 Panel Data With GAUSS 22

Introduction

When they're done right, graphs are a useful tool for telling compelling data stories and supporting data models. However, too often graphs lack the right components to truly enhance understanding.

In this blog, we look at how a few quick customizations help make graphs more impactful. In particular, we will consider:

  • Using grid lines without cluttering a graph.
  • Changing tick labels for readability.
  • Using clear axis labels.
  • Marking events and outcomes with lines, bars, and annotations.

Data

As an example, we will use New York Times COVID tracking data (available on GitHub). This data is part of the New York Times U.S. tracking project.

From this data, we will be using the rolling 7-day average of COVID cases per 100k provided by date for five states: Arizona, California, Florida, Texas, and Washington.

Creating a Basic Graph

Let's start by creating a basic panel data plot using:

First we will load our data:

// Load original data
fname = "us_state_covid_cases.csv";
covid_cases = loadd(fname, 
                    "date($date) + cat(state) + cases + cases_avg_per_100k");

// Filter desired states
covid_cases = selif(covid_cases, 
                    rowcontains(covid_cases[., "state"], 
                                "Florida"$|"California"$|
                                "Arizona"$|"Washington"$|
                                "Texas"));

Note that in this step we've:

  1. Specified the variables we want to load and their variable types.
  2. Filtered our data to include only our states of interest.

Now, we can make a preliminary plot of the rolling 7 day average number of COVID-19 cases per 100,000 people:

// Plot COVID cases per 100K by state
plotXY(covid_cases, "cases_avg_per_100k ~ date + by(state)");

Customizing Our Graph

Our quick graph was a good starting point. However, a few customizations will help present a clearer picture:

  • Adding y-axis grid lines will help us read COVID cases values more easily.
  • Reformatting our x-axis tick labels to include months rather than quarters will make the dates more recognizable.
  • Change axis labels.

Declaring a plotControl Structure

The first step for customizing graphs is to declare a plotControl structure and to fill it with the appropriate defaults:

// Declare plot control structure
struct plotControl myPlot;

// Fill with defaults for "xy" graph
myPlot = plotGetDefaults("xy");

Customizing Plot Attributes

After declaring the plotControl structure, we can use plotSet procedures to change the desired attributes of our graph.

Adding Y-Axis Grid Lines

First, to help make levels of COVID cases more clear, let's add y-axis grid lines to our plot using plotSetYGridPen.

The plotSetYGridPen procedure can be used to set the width, color, and style of the y-axis grid lines:

  • Turn on y-axis major and/or minor grids.
  • Set the width, color, and style of the grid lines.
Input Description
which_grid Specifies which grid line to modify. The options include: "major", "minor", or "both".
width Specifies the thickness of the line(s) in pixels. The default value is 1.
color Optional argument, specifying the name or RGB value of the new color(s) for the line(s).
style Optional argument, the style(s) of the pen for the line(s).
Options include:
1 Solid line
2 Dash line
3 Dot line
4 Dash-Dot line
5 Dash-Dot-Dot line
// Turn on y-axis grid for the major ticks. Set the
// grid lines to be solid, 1 pixel and light grey
plotSetYGridPen(&myPlot, "major", 1, "Light Grey", 1);

Because GAUSS allows us to add and format y-axis and x-axis grid lines separately, we are able to improve readability with y-axis lines without adding the clutter of a full grid.

Customizing X-Axis Ticks

Next, let's turn our attention to the x-axis ticks. We will use three GAUSS procedures to help us customize our ticks:

Procedure Description
plotSetXTicLabel Controls the formatting and angle of x-axis tick labels for 2-D graphs.
plotSetXTicInterval Controls the interval between x-axis tick labels and also allows the user to specify the first tick to be labeled for 2-D graphs.
plotSetTicLabelFont Controls the font name, size and color for the X and Y axis tick labels.

First, let's change the format of the labels on the x-axis to indicate months rather than quarters:

// Display 4 digit year and month on 'X' tick labels
plotSetXTicLabel(&myPlot, "YYYY-MO");

Second, let's set the x-axis ticks to:

  • Start in March of 2020 to correspond with the start of the pandemic.
  • Occur every 3 months.
// Place first 'X' tick mark on March 1st, 2020
// with ticks occurring every 3 months
plotSetXTicInterval(&myPlot, 3, "months", asDate("2020-03"));

Third, let's increase the size of the axis tick labels:

// Change tic label font size
plotSetTicLabelFont(&myPlot, "Arial", 12); 

Updating Axis Labels

Finally, we change the axis labels:

// Specify the text for the Y-axis label as well as
// the font and font size for both labels
plotSetYLabel(&myPlot, "Cases per 100k", "Arial", 14);

// Specify text for the x-axis label
plotSetXLabel(&myPlot, "Date");

Now we can create our formatted graph:

// Plot COVID cases per 100K by state. Pass in the 'plotControl'
// structure, 'myPlot', to use the settings we applied above.
plotXY(myPlot, covid_cases, "cases_avg_per_100k ~ date + by(state)");

Highlighting Events

It's common with time series plots that we want to note specific dates or periods on the graph. GAUSS includes four functions, introduced in GAUSS 22, that make highlighting events easy.

Procedure Description Example
plotAddVLine Adds one or more vertical lines to an existing plot. plotAddVLine("2020-01-01");
plotAddVBar Adds one or more vertical bars spanning the full extent of the y-axis to an existing graph. plotAddVBar("2020-01", "2020-03");
plotAddHLine Adds one or more horizontal lines to an existing plot. plotAddHLine(500);
plotAddHBar Adds one or more horizontal bars spanning the full extent of the x-axis to an existing graph. plotAddHBar(580, 740);

As an example, let's add vertical lines to help compare July 4th, 2020 to July 4th, 2021.

Specifying Legend Behavior When Adding Lines

First, when adding new data to an existing plot, we need to specify how we want this data treated on the legend using the plotSetLegend procedure.

We can add a label for the line to the legend:

// Label next added line "Independence Day"
// and add to the legend
plotSetLegend(&myPlot, "Independence Day");

or we can tell GAUSS to not make any changes to the current legend:

// The empty string specifies that the legend 
// should remain unchanged when the next line is added.
plotSetLegend(&myPlot, "");

Specifying Line Style

Next, we will specify the style of our lines using the plotSetLinePen procedure. This procedure lets us set the width, color, and style of the lines added to the graph.

Attribute Description
width Specifies the thickness of the line(s) in pixels. The default value is 2.
color Optional argument, specifying the name or RGB value of the new color(s) for the line(s).
style Optional argument, the style(s) of the pen for the line(s).
Options include:
1 Solid line
2 Dash line
3 Dot line
4 Dash-Dot line
5 Dash-Dot-Dot line
// Set the line width to be 2 pxs
// the line color to be #555555
// and the line to be dashed
plotSetLinePen(&myPlot, 2, "#555555", 2);

Adding Lines to Mark Events

Finally, let's add the lines marking Independence Day in 2020 and 2021.

We first specify the dates we want to add lines using asDate:

// Create string array of independence days
ind_days = asDate("2020-07-04"$|"2021-07-04");

Then we add our holidays to the existing graph using plotAddVLine:

// Add holidays to graph
plotAddVLine(myPlot, ind_days);

The complete code for adding the lines looks like this:

// Do not add vertical lines to the legend
plotSetLegend(&myPlot, "");

// Set the line width to be 2 pixels
// the line color to be a dark grey color, #555555,
// and the line to be dashed
plotSetLinePen(&myPlot, 2, "#555555", 2);

// Create string array of independence days
ind_days = asDate("2020-07-04"$|"2021-07-04");

// Add holidays to graph
plotAddVline(myPlot, ind_days);

Adding Bars to Mark Events

Now, let's add a vertical bar to mark the winter holidays time period of 2020. We will add a bar that marks the time span from Thanksgiving 2020 to New Year's Day 2021.

We first need to create a new plotControl structure to format our bars. Since we are adding a bar to the graph, we will fill our new plotControl structure with the defaults for a bar graph:

// Create plotControl structure
struct plotControl plt;

// Fill with default bar settings
plt = plotGetDefaults("bar");

Next, we can format our bar using the plotSetFill procedure. The plotSetFill procedure allows us to control the fill style, opacity, and color of graphed bars:

// Set bar to have solid fill with 20% opacity
// and grey color
plotSetFill(&plt, 1, 0.20, "grey");

We also have to specify the legend behavior when the bar is added. This time let's add a label to the legend for the "Winter Holidays":

// Add "Winter Holidays" to the legend
plotSetLegend(&plt, "Winter<br>Holidays");

Now we are ready to add the bar to our graph using the plotAddVBar procedure:

// Add a vertical bar to graph starting 
// on November 26th, 2020 and 
// ending January 1st, 2021
plotAddVBar(plt, asDate("2020-11-26"), asDate("2021-01"));

Adding Notes to Graphs

As final customization, let's add a note to our graph to label one of our holidays. We can do this using the plotAddTextBox procedure.

The plotAddTextBoxtakes three required inputs:

  • The text to be added to the graph.
  • The x location where the text should start.
  • The y location where the text should start.
// Label the 2020 Independence Day line
plotAddTextBox("&larr; Independence Day", asDate("2020-07-04"), 80);

Conclusion

In this blog, we see how a few customizations and enhancements can make plots easier to read and more impactful.

In particular, we covered:

  • Using grid lines without cluttering a graph.
  • Changing tick labels for readability.
  • Using clear axis labels.
  • Marking events and outcomes with lines, bars, and annotations.

Further Reading

References

"The New York Times. (2021). Coronavirus (Covid-19) Data in the United States. Retrieved 12-05-2021, from https://github.com/nytimes/covid-19-data."

Was this post helpful?

Leave a Reply

Have a Specific Question?

Get a real answer from a real person

Need Support?

Get help from our friendly experts.

Try GAUSS for 14 days for FREE

See what GAUSS can do for your data

© Aptech Systems, Inc. All rights reserved.

Privacy Policy