<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>categorical data &#8211; Aptech</title>
	<atom:link href="https://www.aptech.com/blog/tag/categorical-data/feed/" rel="self" type="application/rss+xml" />
	<link>https://www.aptech.com</link>
	<description>GAUSS Software - Fastest Platform for Data Analytics</description>
	<lastBuildDate>Mon, 22 Mar 2021 17:36:21 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	
	<item>
		<title>Easy Management of Categorical Variables</title>
		<link>https://www.aptech.com/blog/easy-management-of-categorical-variables/</link>
					<comments>https://www.aptech.com/blog/easy-management-of-categorical-variables/#respond</comments>
		
		<dc:creator><![CDATA[Eric]]></dc:creator>
		<pubDate>Fri, 19 Mar 2021 21:30:35 +0000</pubDate>
				<category><![CDATA[Econometrics]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[User Interface]]></category>
		<category><![CDATA[categorical data]]></category>
		<category><![CDATA[categorical variables]]></category>
		<guid isPermaLink="false">https://www.aptech.com/?p=11581058</guid>

					<description><![CDATA[Categorical variables offer an important opportunity to capture qualitative effects in statistical modeling. Unfortunately, it can be tedious and cumbersome to manage categorical variables in statistical software. 

The new GAUSS category type, introduced in GAUSS 21, makes it easy and intuitive to work with categorical data. 

In today's blog we use real-life housing data to explore the numerous advantages of the GAUSS category type including:
<ul>
<li>Easy set up and viewing of categorical data.</li>
<li>Simple renaming of category labels.</li>
<li>Easy changing of the reference base case and reordering of categories.</li>
<li>Single-line frequency plots and tables.</li>
<li>Internal creation of dummy variables for regressions.</li>
<li>Proper labeling of categories in regression output.</li> 
</ul>
]]></description>
										<content:encoded><![CDATA[<h3 id="introduction">Introduction</h3>
<p>Categorical variables offer an important opportunity to capture qualitative effects in statistical modeling. Unfortunately, it can be tedious and cumbersome to manage categorical variables in statistical software. </p>
<p>The new GAUSS category type, introduced in <a href="https://www.aptech.com/blog/easy-and-fast-data-management-in-gauss-21/">GAUSS 21</a>, makes it easy and intuitive to work with categorical data. </p>
<p>In today's blog we use real-life housing data to explore the numerous advantages of the GAUSS category type including:</p>
<ul>
<li>Easy set-up and viewing of categorical data.</li>
<li>Simple renaming of category labels.</li>
<li>Easy changing of the reference base case and reordering of categories.</li>
<li>Single-line frequency plots and tables. </li>
<li>Internal creation of dummy variables for regressions.</li>
<li>Proper labeling of categories in regression output. </li>
</ul>
<div style="text-align:center;background-color:#f0f2f4"><hr>Want to see these advantages for yourself? <a href="https://www.aptech.com/request-demo/">Contact us for a GAUSS 21 demo!<hr></a></div>
<h2 id="the-data">The data</h2>
<p>Throughout today's blog, we will be using the <a href="https://www.kaggle.com/dansbecker/melbourne-housing-snapshot/home">Melbourne Housing Snapshot</a> dataset. </p>
<p>The dataset contains the following variables:</p>
<table>
 <thead>
<tr><th>Variable</th><th>Description</th></tr>
</thead>
<tbody>
<tr><td>Suburb</td><td>Number of the suburb.</td></tr>
<tr><td>Address</td><td>House address.</td></tr>
<tr><td>Rooms</td><td>Number of rooms.</td></tr>
<tr><td>Type</td><td>Type of house.</td></tr>
<tr><td>Price</td><td>Sale price.</td></tr>
<tr><td>Method</td><td>Method of sale.</td></tr>
<tr><td>SellerG</td><td>Real estate agent.</td></tr>
<tr><td>Date</td><td>Date sold.</td></tr>
<tr><td>Distance</td><td>Distance from CBD.</td></tr>
<tr><td>Postcode</td><td>Postal code.</td></tr>
<tr><td>Bedroom2</td><td>Number of bedrooms.</td></tr>
<tr><td>Bathrooms</td><td>Number of bathrooms.</td></tr>
<tr><td>Car</td><td>Number of carspots.</td></tr>
<tr><td>Landsize</td><td>Land size.</td></tr>
<tr><td>BuildingArea</td><td>Building size.</td></tr>
<tr><td>YearBuilt</td><td>Year the house was built.</td></tr>
<tr><td>CouncilArea</td><td>Governing council for the area.</td></tr>
<tr><td>Latitude</td><td>Location latitude.</td></tr>
<tr><td>Longitude</td><td>Location longitude.</td></tr>
<tr><td>Regionname</td><td>General region.</td></tr>
<tr><td>Propertycount</td><td>Number of properties that exist in the suburb.</td></tr>
</tbody>
</table>
<h2 id="loading-our-data">Loading our Data</h2>
<p>Let's start by loading our dataset using the GAUSS <strong>Data Import</strong> window: </p>
<p><a href="https://www.aptech.com/wp-content/uploads/2021/03/data-import-melbourne-2.jpg"><img src="https://www.aptech.com/wp-content/uploads/2021/03/data-import-melbourne-2.jpg" alt="" width="1202" height="496" class="aligncenter size-full wp-image-11581138" /></a></p>
<p>When we open the dataset for loading we can see quickly from the color-coded columns what type GAUSS is assigning to our variables. For example:</p>
<ul>
<li><code>Type</code> and <code>Method</code> (highlighted orange) are categorical variables.</li>
<li><code>Suburb</code>, <code>Address</code>, and <code>SellerG</code> (highlighted yellow) are listed as string variables. </li>
<li><code>Rooms</code> and <code>Price</code> (highlighted blue) are numbers. </li>
</ul>
<div class="alert alert-info" role="alert">GAUSS uses an algorithm based on the number of repeated labels and the ratio of varying labels to total observations to detect categorical variables.</div>
<h3 id="changing-a-variable-to-a-category">Changing a Variable to a Category</h3>
<p>Suppose that we also want the variable <code>Suburb</code> to be loaded as a categorical variable. This can be easily done using the <strong>Variables</strong> tab. </p>
<p><a href="https://www.aptech.com/wp-content/uploads/2021/03/suburb-change-type.jpg"><img src="https://www.aptech.com/wp-content/uploads/2021/03/suburb-change-type.jpg" alt="" width="327" height="427" class="aligncenter size-full wp-image-11581099" /></a></p>
<p>When we change the variable <code>Suburb</code> type to <strong>Category</strong>, the <b>Modify Column Mapping</b> window opens:</p>
<p><a href="https://www.aptech.com/wp-content/uploads/2021/03/modify-suburb-type.jpg"><img src="https://www.aptech.com/wp-content/uploads/2021/03/modify-suburb-type.jpg" alt="" width="327" height="427" class="aligncenter size-full wp-image-11581101" /></a></p>
<p>This window is very useful and gives us a number of insights. For example, from this window we can tell that:</p>
<ul>
<li>The category <em>Abbotsford</em> is the <a href="https://docs.aptech.com/gauss/data-management/interactive-import.html?highlight=categorical%20variables#specify-the-category-to-be-the-base-case">base case</a>.</li>
<li>All <a href="https://docs.aptech.com/gauss/data-management/interactive-import.html?highlight=categorical%20variables#change-the-category-mapping">labels and category mapping</a>.  </li>
<li>There are 314 different categories.</li>
</ul>
<p>If we want to explore the categories more we can use the <strong>Label Filter</strong>:</p>
<p><a href="https://www.aptech.com/wp-content/uploads/2021/03/filter-suburb-cats.jpg"><img src="https://www.aptech.com/wp-content/uploads/2021/03/filter-suburb-cats.jpg" alt="" width="327" height="427" class="aligncenter size-full wp-image-11581102" /></a></p>
<p>Once we are done changing the type to category, if we click <strong>OK</strong>, the preview for <code>Suburb</code> is changed to <strong>Category</strong> with orange highlighting:
<a href="https://www.aptech.com/wp-content/uploads/2021/03/suburb-category-now.jpg"><img src="https://www.aptech.com/wp-content/uploads/2021/03/suburb-category-now.jpg" alt="" width="1920" height="937" class="aligncenter size-full wp-image-11581104" /></a></p>
<h3 id="importing-our-data">Importing our Data</h3>
<p>Now that we've set <code>Suburb</code> to be a categorical variable, we're ready to load our data. Clicking <strong>Import</strong> loads our data and auto-generates the GAUSS code for all import steps. </p>
<h2 id="managing-the-properties-of-categorical-variables">Managing the Properties of Categorical Variables</h2>
<p>One of the advantages of using the category type is that the <a href="https://docs.aptech.com/gauss/data-management/data-cleaning.html#changing-categorical-mappings">category properties can be easily modified</a> using the <b>Data Management</b> pane.</p>
<h3 id="changing-category-labels-interactively">Changing Category Labels Interactively</h3>
<p>Suppose that after loading our data, we realize that our current labels for the <code>Type</code> variable are not very clear. Instead, we wish to rename the labels such that :</p>
<table>
 <thead>
<tr><th>Original label</th><th>New label</th></tr>
</thead>
<tbody>
<tr><td>h</td><td>House</td></tr>
<tr><td>t</td><td>Townhouse</td></tr>
<tr><td>u</td><td>Duplex unit</td></tr>
</tbody>
</table>
<p>Once we <a href="https://www.aptech.com/resources/tutorials/introduction-to-gauss-viewing-data-in-gauss/">open the dataframe</a> in the data editor, this is easy to do. We simply:</p>
<ol>
<li> Select the <b>Manage</b> button to open the <b>Data Management</b> pane: 

<a href="https://www.aptech.com/wp-content/uploads/2021/03/data-management-pane.jpg"><img src="https://www.aptech.com/wp-content/uploads/2021/03/data-management-pane.jpg" alt="" width="1920" height="362" class="aligncenter size-full wp-image-11581109"></a> </li>

<li> Click the drop-down button to the right of the variable name and select <b>Properties</b> to open the <b>Modify Column Mapping</b> dialog. 

<a href="https://www.aptech.com/wp-content/uploads/2021/03/variable-properities.jpg"><img src="https://www.aptech.com/wp-content/uploads/2021/03/variable-properities.jpg" alt="" width="354" height="145" class="aligncenter size-full wp-image-11581110"></a> </li>

<li> Enter the new label in the <b>Renamed Label</b> textbox next to the category label we want to change.
<a href="https://www.aptech.com/wp-content/uploads/2021/03/rename-label.jpg"><img src="https://www.aptech.com/wp-content/uploads/2021/03/rename-label.jpg" alt="" width="322" height="353" class="aligncenter size-full wp-image-11581111"></a></li>
</ol>
<p>Let's use the same process to also change the category labels on for the <code>Method</code> variable:
<a href="https://www.aptech.com/wp-content/uploads/2021/03/modify-method-type.jpg"><img src="https://www.aptech.com/wp-content/uploads/2021/03/modify-method-type.jpg" alt="" width="315" height="392" class="aligncenter size-full wp-image-11581112" /></a></p>
<p>Once we've done this, the variable names <code>Type</code> and <code>Method</code> are highlighted in red. This indicates that we have unsaved changes.</p>
<p>When we click <strong>Apply</strong> the changes are saved and the commands GAUSS uses to change the category labels are generated in the <strong>Program Input/Output</strong> window.</p>
<pre class="hljs-container hljs-container-solo"><code class="lang-gauss">melb_data = setcollabels(move(melb_data), "House"$|"Townhouse"$|"Duplex unit", 0|1|2, "Type");
melb_data = setcollabels(move(melb_data), "Passed in"$|"Sold"$|"Sold after auction"$|"Sold prior auction"$|"Vendor bid", 0|1|2|3|4, "Method");</code></pre>
<h3 id="changing-the-base-case">Changing the Base Case</h3>
<p>When working with categorical variables, it is useful to know that GAUSS always treats the variable with the key value &quot;0&quot; as the base case. For example, we saw earlier that the base case for the <code>Suburb</code> variable was <em>Abbotsford</em>.</p>
<p>If you want to change the assigned base case (or otherwise reorder the categories), this can quickly be done using the <b>Modify Column Mapping</b> dialog. </p>
<p>For example, suppose we want to replicate a study that uses the category <em>Chelsea</em> as the base case. To do this we:</p>
<ol>
<li> Locate and select the category <em>Chelsea</em> using the <b>Label Filter</b></li>
<li> Use the <b>Double Arrow</b> button to move <em>Chelsea</em> to the top of the list.
<a href="https://www.aptech.com/wp-content/uploads/2021/03/change-base-case.jpg"><img src="https://www.aptech.com/wp-content/uploads/2021/03/change-base-case.jpg" alt="" width="321" height="341" class="aligncenter size-full wp-image-11581116"></a></li>
<li> <b>Apply</b> our changes. </li>
</ol>
<div class="alert alert-info" role="alert">More detailed information on how to manage categorical variables, both interactively and programmatically, can be found in our <a href="https://docs.aptech.com/gauss/data-management.html">Data Management Guide.</a>  </div>
<h2 id="exploring-our-categorical-variables">Exploring our Categorical Variables</h2>
<p>To dive deeper into our categorical variables we can use the <a href="https://docs.aptech.com/gauss/dstatmt.html">dstatmt</a>, <a href="https://docs.aptech.com/gauss/frequency.html">frequency</a>, and <a href="https://docs.aptech.com/gauss/plotfreq.html">plotFreq</a> procedures.</p>
<h3 id="general-summary-statistics">General Summary Statistics</h3>
<p>First, let's get a general overview of our data, including the categorical variables, using the <code>dstatmt</code> function:</p>
<pre class="hljs-container hljs-container-solo"><code class="lang-gauss">// Print descriptive statistics for all variables in 'melb_data'
call dstatmt(melb_data);</code></pre>
<p>This prints a table of descriptive statistics to the <strong>Program Input/Output</strong> window:</p>
<pre>----------------------------------------------------------------------------------------------
Variable              Mean     Std Dev      Variance     Minimum     Maximum     Valid Missing
----------------------------------------------------------------------------------------------

Suburb               -----       -----         -----     Chelsea  Yarraville     13580    0
Address              -----       -----         -----       -----       -----     13580    0
Rooms                2.938      0.9557        0.9135           1          10     13580    0
Type                 -----       -----         -----       House  Duplex uni     13580    0
Price            1.076e+06      639311     4.087e+11     8.5e+04       9e+06     13580    0
Method               -----       -----         -----   Passed in  Vendor bid     13580    0
SellerG              -----       -----         -----       -----       -----     13580    0
Date                 -----       -----         -----  28/01/2016  23/09/2017     13580    0
Distance             10.14       5.869         34.44           0        48.1     13580    0
Postcode              3105       90.68          8222        3000        3977     13580    0
Bedroom2             2.915      0.9659         0.933           0          20     13580    0
Bathroom             1.534      0.6917        0.4785           0           8     13580    0
Car                   1.61      0.9626        0.9267           0          10     13518   62
Landsize             558.4        3991     1.593e+07           0    4.33e+05     13580    0
BuildingArea           152         541        292697           0   4.452e+04      7130 6450
YearBuilt             1965       37.27          1389        1196        2018      8205 5375
CouncilArea          -----       -----         -----     Banyule  Yarra Rang     12211 1369
Latitude            -37.81     0.07926      0.006282      -38.18      -37.41     13580    0
Longitude              145      0.1039        0.0108       144.4       145.5     13580    0
Regionname           -----       -----         -----  Eastern Me  Western Vi     13580    0
Propertycount         7454        4379     1.917e+07         249   2.165e+04     13580    0 </pre>
<p>Though traditional summary statistics aren't valid for categorical data, the descriptive statistics still provide some insights:</p>
<ul>
<li>The minimum category is always the base case for the variable. </li>
<li>We can identify if there are any missing observations. </li>
</ul>
<h3 id="frequency-table-of-categories">Frequency Table of Categories</h3>
<p>The <code>frequency</code> procedure was introduced in GAUSS 21 specifically to provide frequency count tables. The procedure requires two inputs:</p>
<hr>
<dl>
<dt>x</dt>
<dd>Data matrix or data frame.</dd>
<dt>varlist</dt>
<dd>String, names or indices of variables to be counted. If names, should be entered as a formula string e.g <code>"rep78 + foreign"</code>.
<hr></dd>
</dl>
<p>To see frequency counts for both <code>Method</code> and <code>Type</code> in our dataframe <code>melb_data</code> we enter:</p>
<pre class="hljs-container hljs-container-solo"><code class="lang-gauss">// Print frequency tables for the 'Type' and 'Method' variables
// in the 'melb_data' dataframe
frequency(melb_data, "Type + Method");</code></pre>
<p>This prints the category names along with:</p>
<ul>
<li>Total counts.</li>
<li>Frequency percentages.</li>
<li>Cumulative percentages.</li>
</ul>
<pre>             Label      Count   Total %    Cum. %
             House       9449     69.58     69.58
         Townhouse       1114     8.203     77.78
       Duplex unit       3017     22.22       100
             Total      13580       100<br />

             Label      Count   Total %    Cum. %
         Passed in       1564     11.52     11.52
              Sold       9022     66.44     77.95
Sold after auction         92    0.6775     78.63
Sold prior auction       1703     12.54     91.17
        Vendor bid       1199     8.829       100
             Total      13580       100    </pre>
<h3 id="frequency-plot-of-categories">Frequency Plot of Categories</h3>
<p>The information provided with the <code>frequency</code> procedure can be quickly visualized using the <code>plotFreq</code> procedure. </p>
<p>For example, let's plot the frequencies for the <code>Method</code> variable:</p>
<pre class="hljs-container hljs-container-solo"><code class="lang-gauss">plotFreq(melb_data, "Method");</code></pre>
<p><a href="https://www.aptech.com/wp-content/uploads/2021/03/frequency-plot-2-scaled.jpeg"><img src="https://www.aptech.com/wp-content/uploads/2021/03/frequency-plot-2-scaled.jpeg" alt="" width="1280" height="960" class="aligncenter size-full wp-image-11581140" /></a></p>
<h2 id="estimation-with-categorical-variables">Estimation with Categorical Variables</h2>
<p>The final area we will explore today is the use of the GAUSS category type in estimation. Estimation of one of the areas that the GAUSS category type offers the greatest advantages. </p>
<p>GAUSS category variables can be used in estimation routines, such as <a href="https://docs.aptech.com/gauss/olsmt.html">olsmt</a> or <a href="https://docs.aptech.com/gauss/glm.html">glm</a> without taking any additional step. </p>
<p>When category variables are detected in estimation routines GAUSS will automatically:</p>
<ul>
<li>Create and use dummy variables during estimation. </li>
<li>Exclude the base case category.</li>
<li>Print output tables using specified category labels. </li>
</ul>
<div style="text-align:center;background-color:#37444d;padding-top:40px;padding-bottom:40px;"><span style="color:#FFFFFF">Using categorical data in your models?</span> <a href="https://www.aptech.com/request-demo/">See how the new category type works for you!</a></div>
<p> 
 </p>
<p>As an example, let's run a simple linear regression model estimating the role of <code>Method</code>, <code>Bedroom2</code>, and <code>Bath</code> on <code>Price</code>:</p>
<pre class="hljs-container hljs-container-solo"><code class="lang-gauss">call olsmt(melb_data, "price ~ method + bedroom2 + bath");</code></pre>
<pre>                                           Standard                 Prob   Standardized  Cor with
Variable                       Estimate      Error      t-value     &gt;|t|     Estimate    Dep Var
-------------------------------------------------------------------------------------------------

CONSTANT                        59797.1     20407.1     2.93022     0.003       ---         ---<br />
Method: Sold                    40895.7     14857.5     2.75252     0.006   0.0302079   0.0256233
Method: Sold after auction     -54790.4     57921.6    -0.94594     0.344 -0.00703034  -0.0064481
Method: Sold prior auction      -109114     18989.1    -5.74614     0.000  -0.0565257   -0.104125
Method: Vendor bid              77144.6     20737.2      3.7201     0.000   0.0342371   0.0442128
Bedroom2                         202813     5924.64     34.2321     0.000    0.306426    0.475951
Bathroom                         263854      8314.3      31.735     0.000    0.285481    0.467038 </pre>
<p>Notice that the results include estimated coefficients for each of the <code>Method</code> categories, except the base case, <em>Passed In</em>.</p>
<div class="alert alert-info" role="alert">For more information on how to use and interpret categorical variables in linear regression see our earlier blog, <a href="https://www.aptech.com/blog/introduction-to-categorical-variables/">&quot;Introduction to Categorical Variables&quot;</a>. </div>
<h2 id="conclusion">Conclusion</h2>
<p>Today we've seen how fast and easy it can be to work with categorical variables using the new GAUSS category type. Whether you're just getting started exploring your data or you're in the final stage of estimation, the category type can speed up your work and get you to your results faster. </p>
<h2 id="further-reading">Further reading</h2>
<ol>
<li><a href="https://www.aptech.com/blog/introduction-to-categorical-variables/">Introduction to Categorical Variables</a>. </li>
<li><a href="https://www.aptech.com/blog/easy-and-fast-data-management-in-gauss-21/">Easy and Fast Data Management in GAUSS 21</a> .</li>
<li><a href="https://www.aptech.com/blog/preparing-and-cleaning-data-fred-data-in-gauss/">Preparing and Cleaning FRED data in GAUSS</a>/</li>
</ol>
<p></p>
]]></content:encoded>
					
					<wfw:commentRss>https://www.aptech.com/blog/easy-management-of-categorical-variables/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
	</channel>
</rss>
