# Introduction to Categorical Variables

## Introduction

Categorical variables are an important part of research and modeling. They arise anytime we have observations that fall into discrete groups, rather than on a continuous scale.

Some everyday examples include:

• Marriage status (Not Married, Married)
• Transportation choices (Car, Subway, Bus, Other),
• Performance ratings (Poor, Fair, Average, Good, Excellent)

In today’s blog, we look more closely at what categorical variables are and how these variables are treated in estimation.

## What is a categorical variable?

A categorical variable is a discrete variable that captures qualitative outcomes by placing observations into fixed groups (or levels). The groups are mutually exclusive, which means that each individual fits into only one category.

### Types of categorical variables

Categorical variables can be used to represent different types of qualitative data. For example:

• Ordinal data - represents outcomes for which the order of the groups is relevant.
• Nominal data - represent outcomes for which the order of groups does not matter.
• Binary data - data with only two possible outcomes.

### Who uses categorical variables?

Categorical variables are used widely across fields:

Example Field Type Categories
Income range Economics, Sociology Nominal

## Example: Using categorical variables in ordinary least squares

Let's estimate our linear regression MPG model from earlier. We will start by loading the auto2.dta dataset from the GAUSS example directory. When loading data for this model we:

• Load the MPG, Weight, and Foreign variables.
• Specify that Foreign is a categorical variable.

The code for this action is auto-generated:

auto2 = loadd("C:/gauss21/examples/auto2.dta", "mpg + weight + cat(foreign)");

### Running the regression

Next, we will call olsmt to estimate our model. Using our categorical variable with olsmt is easy and requires no extra steps:

call olsmt(auto2, "MPG ~ weight + foreign");

The results are printed:

                            Standard             Prob      Std.    Cor with
Variable          Estimate    Error    t-value   >|t|      Est.    Dep Var
---------------------------------------------------------------------------

CONSTANT             41.68     2.166     19.25    0.00     ---       ---
weight            -0.00659  0.000637    -10.34    0.00    -0.8860   -0.807
foreign: Foreign    -1.650     1.076    -1.534    0.13    -0.1313    0.393

### Interpreting our results

There are a few notable components to our linear regression results:

1. The estimated coefficient on the Foreign level is 1.650. This tells us that after accounting for weight, foreign cars have an MPG 1.65 lower than domestic cars.
2. Our p-value of 13% tells us that this difference is not statistically significant.
3. GAUSS automatically identifies the categories and labels them appropriately in our results table. The variable name foreign: Foreign tells us the that coefficient in the table is for the category Foreign of the variable foreign.

### Conclusions

Categorical variables have an important role in modeling, as they offer a quantitative way to include qualitative outcomes in our models. However, it is important to know how to appropriately use them and how to appropriately interpret models that include them.

After today's blog, you should have the foundation to begin working with categorical variables and a better knowledge of:

• What categorical variables are.
• How to include categorical variables in models.
• How to interpret results when categorical variables are used in linear regression.

### Have a Specific Question?

Get a real answer from a real person

### Need Support?

Get help from our friendly experts.