<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Panel data &#8211; Aptech</title>
	<atom:link href="https://www.aptech.com/blog/category/panel-data/feed/" rel="self" type="application/rss+xml" />
	<link>https://www.aptech.com</link>
	<description>GAUSS Software - Fastest Platform for Data Analytics</description>
	<lastBuildDate>Mon, 13 Oct 2025 14:47:55 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	
	<item>
		<title>Exploring and Cleaning Panel Data with GAUSS 25</title>
		<link>https://www.aptech.com/blog/exploring-and-cleaning-panel-data-with-gauss-25/</link>
					<comments>https://www.aptech.com/blog/exploring-and-cleaning-panel-data-with-gauss-25/#respond</comments>
		
		<dc:creator><![CDATA[Eric]]></dc:creator>
		<pubDate>Tue, 28 Jan 2025 17:38:02 +0000</pubDate>
				<category><![CDATA[Panel data]]></category>
		<category><![CDATA[Releases]]></category>
		<guid isPermaLink="false">https://www.aptech.com/?p=11584930</guid>

					<description><![CDATA[Panel data offers a unique opportunity to examine both individual-specific and time-specific effects. However, as anyone who has worked with panel data knows, these same features that make panel data so useful can also make exploration and cleaning particularly challenging. 

GAUSS 25 was designed with these challenges in mind. It introduces a comprehensive new suite of panel data tools, tailored to make working with panel data in GAUSS easier, faster, and more intuitive. 

In today's blog, we’ll look at these new tools and demonstrate how they can simplify everyday panel data tasks, including:
<ul>
<li>Loading your data.</li>
<li>Preparing your panel dataset. </li>
<li>Exploring panel data characteristics. </li>
<li>Visualizing <a href="https://www.aptech.com/blog/introduction-to-the-fundamentals-of-panel-data/" target="_blank" rel="noopener">panel data</a>. </li>
<li> Transforming your data for modeling. </li>
]]></description>
										<content:encoded><![CDATA[<p>    <!-- MathJax configuration -->
    <style>
        .mjx-svg-href {
            fill: "inherit" !important;
            stroke: "inherit" !important;
        }
    </style>
    <script type="text/x-mathjax-config">
        MathJax.Hub.Config({ TeX: { equationNumbers: {autoNumber: "AMS"} } });
    </script>
    <script type="text/javascript">
window.MathJax = {
  tex2jax: {
    inlineMath: [ ['$','$'] ],
    displayMath: [ ['$$','$$'] ],
    processEscapes: true,
    processEnvironments: true
  },
  // Center justify equations in code and markdown cells. Elsewhere
  // we use CSS to left justify single line equations in code cells.
  displayAlign: 'center',
  "HTML-CSS": {
    styles: {'.MathJax_Display': {"margin": 0}},
    linebreaks: { automatic: false }
  },
  "SVG": {
    styles: {'.MathJax_SVG_Display': {"margin": 0}},
    linebreaks: { automatic: false }
  },
  showProcessingMessages: false,
  messageStyle: "none",
  menuSettings: { zoom: "Click" },
  AuthorInit: function() {
    MathJax.Hub.Register.StartupHook("End", function() {
            var timeout = false, // holder for timeout id
            delay = 250; // delay after event is "complete" to run callback
            var shrinkMath = function() {
              //var dispFormulas = document.getElementsByClassName("formula");
              var dispFormulas = document.getElementsByClassName("MathJax_SVG_Display");
              if (dispFormulas){
                // caculate relative size of indentation
                var contentTest = document.getElementsByTagName("body")[0];
                var nodesWidth = contentTest.offsetWidth;
                // if you have indentation
                var mathIndent = MathJax.Hub.config.displayIndent; //assuming px's
                var mathIndentValue = mathIndent.substring(0,mathIndent.length - 2);
                for (var i=0; i<dispFormulas.length; i++){
                  var dispFormula = dispFormulas[i];
                  var wrapper = dispFormula;
                  //var wrapper = dispFormula.getElementsByClassName("MathJax_Preview")[0].nextSibling;
                  var child = wrapper.firstChild;
                  wrapper.style.transformOrigin = "center"; //or top-left if you left-align your equations
                  var oldScale = child.style.transform;
                  //var newValue = Math.min(0.80*dispFormula.offsetWidth / child.offsetWidth,1.0).toFixed(2);
                  var newValue = Math.min(dispFormula.offsetWidth / child.offsetWidth,1.0).toFixed(2);
                  var newScale = "scale(" + newValue + ")";
                  if(newValue != "NaN" && !(newScale === oldScale)){
                    wrapper.style.transform = newScale;
                    wrapper.style["margin-left"]= Math.pow(newValue,4)*mathIndentValue + "px";
                    var wrapperStyle = window.getComputedStyle(wrapper);
                    var wrapperHeight = parseFloat(wrapperStyle.height);
                    wrapper.style.height = "" + (wrapperHeight * newValue) + "px";
                    if(newValue === "1.00"){
                      wrapper.style.cursor = "";
                      wrapper.style.height = "";
                    }
                    else {
                      wrapper.style.cursor = "zoom-in";
                    }
                  }

                }
            }
            };
            shrinkMath();
            window.addEventListener('resize', function() {
              clearTimeout(timeout);
              timeout = setTimeout(shrinkMath, delay);
            });
          });
  }
}
</script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.7/MathJax.js?config=TeX-AMS_SVG"></script></p>
<h3 id="introduction">Introduction</h3>
<p>Panel data offers a unique opportunity to examine both individual-specific and time-specific effects. However, as anyone who has worked with panel data knows, these same features that make panel data so useful can also make exploration and cleaning particularly challenging. </p>
<p><a href="https://www.aptech.com/blog/more-research-less-effort-with-gauss-25/" target="_blank" rel="noopener">GAUSS 25</a> was designed with these challenges in mind. It introduces a comprehensive new suite of tools, tailored to make working with panel data in GAUSS easier, faster, and more intuitive. </p>
<p>In today's blog, we’ll demonstrate how these tools can simplify everyday panel data tasks, including:</p>
<ul>
<li>Loading your data.</li>
<li>Preparing your panel dataset. </li>
<li>Exploring panel data characteristics. </li>
<li>Visualizing <a href="https://www.aptech.com/blog/introduction-to-the-fundamentals-of-panel-data/" target="_blank" rel="noopener">panel data</a>. </li>
<li>Transforming your data for modeling. </li>
</ul>
<h2 id="data">Data</h2>
<p>Today we will work use a subset of the publicly available Penn World Table version 10.01, available for download <a href="https://github.com/aptech/gauss_blog/raw/refs/heads/master/econometrics/exploring-and-cleaning-panel-data-g25-1.23.24/pwt_10.gdat" target="_blank" rel="noopener">here</a>. </p>
<table>
  <thead>
    <tr>
      <th colspan="2">
        <h3 id="penn-world-table-variables"><br>Penn World Table Variables</h3>
      </th>
    </tr>
    <tr>
      <th>Variable Name</th>
      <th>Description</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>currency_unit</td>
      <td>The currency unit used for GDP measurements.</td>
    </tr>
    <tr>
      <td>countrycode</td>
      <td>The three-letter ISO country code.</td>
    </tr>
    <tr>
      <td>country</td>
      <td>The name of the country.</td>
    </tr>
    <tr>
      <td>year</td>
      <td>The year of observation.</td>
    </tr>
    <tr>
      <td>rgdpe</td>
      <td>Real GDP at constant prices (expenditure-side).</td>
    </tr>
    <tr>
      <td>rgdpo</td>
      <td>Real GDP at constant prices (output-side).</td>
    </tr>
    <tr>
      <td>pop</td>
      <td>Population of the country.</td>
    </tr>
    <tr>
      <td>emp</td>
      <td>Number of employed persons.</td>
    </tr>
    <tr>
      <td>irr</td>
      <td>Investment rate of return.</td>
    </tr>
  </tbody>
</table>
<div class="alert alert-info" role="alert">Data Citation:<br>Feenstra, Robert C., Robert Inklaar and Marcel P. Timmer (2015), &quot;The Next Generation of the Penn World Table&quot; American Economic Review, 105(10), 3150-3182, available for download at www.ggdc.net.</div>
<h2 id="loading-our-panel-data">Loading Our Panel Data</h2>
<p>We'll start by using the <a href="https://docs.aptech.com/next/gauss/loadd.html" target="_blank" rel="noopener">loadd</a> procedure to load our data. </p>
<pre class="hljs-container hljs-container-solo"><code class="lang-gauss">// Load data from 'pwt_10.gdat
// Using __FILE_DIR to specify data path
pwt_10 = loadd(__FILE_DIR $+ "pwt_10.gdat");

// Preview data 
head(pwt_10);</code></pre>
<div class="alert alert-info" role="alert">For more information on using __FILE_DIR please see our earlier blog, <a href="https://www.aptech.com/blog/make-your-code-portable-data-paths/" target="_blank" rel="noopener">Make Your Code Portable: Data Paths</a></div>
<p>The <a href="https://docs.aptech.com/gauss/head.html" target="_blank" rel="noopener">head</a> procedure prints the first five observations of the our dataset, helping us check that our data has loaded properly:</p>
<pre>   currency_unit      countrycode          country             year            rgdpe            rgdpo              pop              emp              irr
  Aruban Guilder              ABW            Aruba       1991-01-01        2804.5005        3177.4575      0.064622000      0.029200001       0.11486563
  Aruban Guilder              ABW            Aruba       1992-01-01        2944.5161        3370.5376      0.068235000      0.030903272       0.11182721
  Aruban Guilder              ABW            Aruba       1993-01-01        3131.3708        3698.5325      0.072504000      0.032911807       0.11131135
  Aruban Guilder              ABW            Aruba       1994-01-01        3537.9534        4172.8242      0.076700000      0.034895979       0.10574290
  Aruban Guilder              ABW            Aruba       1995-01-01        3412.8745        4184.1562      0.080324000      0.036628015       0.10471709 </pre>
<p>It's important to note that to identify our panel, GAUSS requires a <a href="https://www.aptech.com/blog/what-is-a-gauss-dataframe-and-why-should-you-care/" target="_blank" rel="noopener">dataframe</a> to have at least one <a href="https://www.aptech.com/blog/dates-and-times-made-easy/" target="_blank" rel="noopener">date variable</a>
and one <a href="https://www.aptech.com/blog/easy-management-of-categorical-variables/" target="_blank" rel="noopener">categorical</a> or <a href="https://www.aptech.com/blog/managing-string-data-with-gauss-dataframes/" target="_blank" rel="noopener">string</a> variable. </p>
<p>We will look more closely at how GAUSS identifies panels in the next section. For now, let's check that our data meets this requirement using the <a href="https://docs.aptech.com/gauss/getcoltypes.html" target="_blank" rel="noopener">getcoltypes</a> procedure.</p>
<pre class="hljs-container hljs-container-solo"><code class="lang-gauss">// Check column types
getcoltypes(pwt_10);</code></pre>
<pre>            type
        category
        category
        category
            date
          number
          number
          number
          number
          number</pre>
<p>Our data meets the GAUSS requirement for panel data, with three categorical variables and one date variable. </p>
<div style="text-align:center;background-color:#f0f2f4"><hr>Ready to get started using GAUSS for panel data? <a href="https://www.aptech.com/request-demo/">Contact us for a GAUSS 25 demo!<hr></a></div>
<h2 id="preparing-panel-data">Preparing Panel Data</h2>
<p>Besides the data type requirements, the GAUSS panel data procedures assume a few important things about the form of your panel data. </p>
<p>In particular, your panel data should:</p>
<ul>
<li>Be in stacked long form.</li>
<li>Have the date and group identification columns occurring before other date and categorical/string variables. (This is not required but it is the most convenient way to work the GAUSS panel data procedures.)</li>
<li>Be sorted by group then time.</li>
</ul>
<p>Let’s look more closely at how to use GAUSS to ensure that our data meets these requirements.</p>
<h3 id="transforming-panel-data-to-long-form">Transforming panel data to long form</h3>
<p>If your panel data is in wide form, it's easy to convert to long form using the <a href="https://docs.aptech.com/gauss/dflonger.html" target="_blank" rel="noopener">dflonger</a> procedure. This procedure is a very versatile procedure -- it's designed to be intuitive enough to cover basic transformation with little effort but flexible enough to tackle complex cases. </p>
<p>Since, the <code>pwt_10</code> data is already in long form, so we don't need to transform our data. However, for an in-depth look at <code>dflonger</code>, including examples, see our previous blog, <a href="https://www.aptech.com/blog/transforming-panel-data-to-long-form-in-gauss/" target="_blank" rel="noopener">Transforming Panel Data to Long Form in GAUSS</a>.</p>
<h3 id="ordering-variables">Ordering variables</h3>
<p>One of the most convenient features of the new panel data procedures is their ability to intelligently detect group and time variables. To ensure this works properly, simply make sure that the date variable and group variable identifying your panel are the first occurring date and categorical/string variables in your dataset.</p>
<p>Let's take a look at our <code>pwt_10</code> dataframe:
<a href="https://www.aptech.com/wp-content/uploads/2025/01/Screenshot-2025-01-23-123605.png"><img src="https://www.aptech.com/wp-content/uploads/2025/01/Screenshot-2025-01-23-123605.png" alt="" width="850" height="482" class="aligncenter size-full wp-image-11584963" /></a></p>
<div class="alert alert-info" role="alert">The <code>Ctrl+E</code> hot key opens the variable under cursor in a floating symbol editor window, allowing you to quickly view workplace symbols.</div>
<p><b> Identifying panel data groups</b><br />
Our dataset contains three categorical variables: <em>currency_unit</em>, <em>countrycode</em>, and <em>country</em>. By default, GAUSS will use the first occurring categorical variable, <em>currency_unit</em>, to identify the groups in the panel, unless we specify otherwise.</p>
<p><b> Identifying time dimension</b><br />
Our dataset also includes a date variable, <em>year</em>, which GAUSS will automatically use to identify the time dimension of the panel.</p>
<p>As the dataframe is now, GAUSS will use <em>currency_unit</em> and <em>year</em> to identify our panel. In this dataset, however, the panel should be identified by <em>country</em> and <em>year</em>. To address this, we could use <a href="https://www.aptech.com/blog/the-basics-of-optional-arguments-in-gauss-procedures/" target="_blank" rel="noopener">optional arguments</a> to specify that our group variable is <em>country</em>. However, we would need to do this every time we use one of the panel data procedures. </p>
<p>Instead, we can use the <a href="https://docs.aptech.com/gauss/order.html" target="_blank" rel="noopener">order</a> procedure to move the <em>country</em> and <em>year</em> variables to the front of our dataframe.</p>
<pre class="hljs-container hljs-container-solo"><code class="lang-gauss">// Move country 
pwt_10 = order(pwt_10, "country"$|"year");</code></pre>
<p><a href="https://www.aptech.com/wp-content/uploads/2025/01/Screenshot-2025-01-23-133652.png"><img src="https://www.aptech.com/wp-content/uploads/2025/01/Screenshot-2025-01-23-133652.png" alt="" width="1585" height="864" class="aligncenter size-full wp-image-11584964" /></a></p>
<p>Now, in our reordered <code>pwt_10</code> dataframe, we see that <em>country</em> and <em>year</em> appear as the first two columns. GAUSS will automatically use these to identify the group and time dimensions, respectively.</p>
<p>A few things to note:</p>
<ul>
<li>It is not necessary to move the <em>year</em> variable. Since it is only date variable in the dataframe, GAUSS will use <em>year</em> to identify our time dimension regardless of its position. </li>
<li>The <em>country</em> variable does not need to be the first column in the dataframe. It only needs to appear before the other categorical variables for GAUSS to automatically recognize it as the group dimension.</li>
</ul>
<h3 id="sorting-panel-data">Sorting panel data</h3>
<p>Beyond the fact that the GAUSS panel data functions expect sorted data, there are many advantages to working with sorted data:</p>
<ul>
<li>Sorted data is easier to browse and explore. </li>
<li>Econometric techniques, such as calculating lags and differences, rely on the data being ordered consistently. </li>
<li>Proper sorting helps avoid errors, ensures reproducibility, and lays a solid foundation for reliable results.</li>
</ul>
<p>The new <a href="https://docs.aptech.com/gauss/pdsort.html" target="_blank" rel="noopener">pdsort</a> procedure allows you to quickly sort panel data by the group then date dimension.  </p>
<pre class="hljs-container hljs-container-solo"><code class="lang-gauss">// Sort data using
// automatic group and date variables 
pwt_10 = pdSort(pwt_10);</code></pre>
<h2 id="assessing-panel-data-structure">Assessing Panel Data Structure</h2>
<p>When working with panel data, understanding your data's structure is important. It can play a role in the methods and assumptions applied in your models. For example, many techniques are only valid for balanced data and will produce unreliable results if your panel is unbalanced. </p>
<p>Some important considerations include:</p>
<ul>
<li>Whether the data is balanced.</li>
<li>The presence of gaps or missing data.</li>
<li>The ratio of groups to the number of time observations for each group.</li>
</ul>
<p>By examining our panel’s structure upfront, we can:</p>
<ul>
<li>Identify potential challenges.</li>
<li>Select the most appropriate analytical techniques.</li>
<li>Prevent errors that might result in biased or misleading conclusions.</li>
</ul>
<p>GAUSS includes a suite of panel data tools, introduced in GAUSS 25, that are designed for exploring the structure of panel data.</p>
<table>
  <thead>
    <tr>
      <th colspan="3">
        <h3 id="gauss-functions-for-panel-data-structure"><br>GAUSS Functions for Panel Data Structure</h3>
      </th>
    </tr>
    <tr>
      <th>Function Name</th>
      <th>Description</th>
      <th>Example</th>
    </tr>
  </thead>
  <tbody>
     <tr>
      <td><a href="https://docs.aptech.com/gauss/pdisbalanced.html" target="_blank" rel="noopener">pdIsBalanced</a></td>
      <td>Determines whether each group in a panel dataset covers the maximum time span.</td>
      <td><code>groupisBalanced = pdIsBalanced(pwt_10)</code></td>
    </tr>
     <tr>
      <td><a href="https://docs.aptech.com/gauss/pdallbalanced.html" target="_blank" rel="noopener">pdAllBalanced</a></td>
      <td>Checks if a panel dataset is strongly balanced and returns 1 if balanced, 0 otherwise.</td>
      <td><code>isBalanced = pdAllBalanced(pwt_10)</code></td>
    </tr>
    <tr>
      <td><a href="https://docs.aptech.com/gauss/pdisconsecutive.html" target="_blank" rel="noopener">pdIsConsecutive</a></td>
      <td>Checks if each group in a panel dataset covers consecutive time periods without gaps.</td>
      <td><code>groupisConsecutive = pdIsConsecutive(pwt_10)</code></td>
    </tr>
    <tr>
      <td><a href="https://docs.aptech.com/gauss/pdallconsecutive.html" target="_blank" rel="noopener">pdAllConsecutive</a></td>
      <td>Verifies whether all groups in a panel dataset have consecutive time periods without gaps.</td>
      <td><code>isConsecutive = pdAllConsecutive(pwt_10)</code></td>
    </tr>
    <tr>
      <td><a href="https://docs.aptech.com/gauss/pdsize.html" target="_blank" rel="noopener">pdSize</a></td>
      <td>Provides size description of a panel dataset including the number of groups, number of time observations for each group.</td>
      <td><code>{ num_grps, T, balanced } = pdSize(pwt_10)</code></td>
    </tr>
    <tr>
      <td><a href="https://docs.aptech.com/gauss/pdtimespans.html" target="_blank" rel="noopener">pdTimeSpans</a></td>
      <td>Returns the time span (start and end dates) by group of variables in panel data.</td>
      <td><code>df_tspans = pdTimeSpans(pwt_10)</code></td>
    </tr>
  </tbody>
</table>
<h3 id="exploring-the-structure-of-the-penn-world-table">Exploring the structure of the Penn World Table</h3>
<p>Now let's take a look at the structure of our Penn World Table data. First, we'll quickly check if our panel is balanced strongly balanced and consecutive.</p>
<pre class="hljs-container hljs-container-solo"><code class="lang-gauss">print "Panel is balanced:";
pdAllBalanced(pwt_10);

// Check for consecutiveness
print "Panel is consecutive:";
pdAllConsecutive(pwt_10);</code></pre>
<pre>Panel is balanced:
       0.0000000
Panel is consecutive:
       1.0000000 </pre>
<p>This tells us that our panel is not strongly balanced but it is consecutive. </p>
<p>Now that we know our panel is unbalanced, we should take a closer look our data structure using <code>pdSize</code>. </p>
<pre class="hljs-container hljs-container-solo"><code class="lang-gauss">// Get summary of panel dimensions
{ num_grps, T, balanced } = pdSize(pwt_10);</code></pre>
<div style="max-height: 600px; overflow-y: scroll; border: 1px solid #ddd; padding: 10px;">
<pre>
================================================================================
Group ID:                   country          Balanced:                        No
Valid cases:                   7540          Missings:                         0
N. Groups:                      137          T. Average:                  55.036
================================================================================
country                                       T[i]     Start Date       End Date
--------------------------------------------------------------------------------

Angola                                          50     1970-01-01     2019-01-01 
Argentina                                       70     1950-01-01     2019-01-01 
Armenia                                         30     1990-01-01     2019-01-01 
Aruba                                           29     1991-01-01     2019-01-01 
Australia                                       70     1950-01-01     2019-01-01 
Austria                                         70     1950-01-01     2019-01-01 
Azerbaijan                                      30     1990-01-01     2019-01-01 
Bahamas                                         47     1973-01-01     2019-01-01 
Bahrain                                         50     1970-01-01     2019-01-01 
Barbados                                        60     1960-01-01     2019-01-01 
Belarus                                         30     1990-01-01     2019-01-01 
Belgium                                         70     1950-01-01     2019-01-01 
Benin                                           40     1980-01-01     2019-01-01 
Bermuda                                         34     1986-01-01     2019-01-01 
Bolivia (Plurinational State of)                70     1950-01-01     2019-01-01 
Bosnia and Herzegovina                          30     1990-01-01     2019-01-01 
Botswana                                        60     1960-01-01     2019-01-01 
Brazil                                          70     1950-01-01     2019-01-01 
British Virgin Islands                          29     1991-01-01     2019-01-01 
Bulgaria                                        50     1970-01-01     2019-01-01 
Burkina Faso                                    61     1959-01-01     2019-01-01 
Burundi                                         40     1980-01-01     2019-01-01 
Cabo Verde                                      40     1980-01-01     2019-01-01 
Cameroon                                        60     1960-01-01     2019-01-01 
Canada                                          70     1950-01-01     2019-01-01 
Cayman Islands                                  29     1991-01-01     2019-01-01 
Central African Republic                        40     1980-01-01     2019-01-01 
Chad                                            60     1960-01-01     2019-01-01 
Chile                                           69     1951-01-01     2019-01-01 
China                                           68     1952-01-01     2019-01-01 
China, Hong Kong SAR                            60     1960-01-01     2019-01-01 
China, Macao SAR                                40     1980-01-01     2019-01-01 
Colombia                                        70     1950-01-01     2019-01-01 
Costa Rica                                      70     1950-01-01     2019-01-01 
Croatia                                         30     1990-01-01     2019-01-01 
Cyprus                                          70     1950-01-01     2019-01-01 
Czech Republic                                  30     1990-01-01     2019-01-01 
Côte d'Ivoire                                   60     1960-01-01     2019-01-01 
Denmark                                         70     1950-01-01     2019-01-01 
Djibouti                                        40     1980-01-01     2019-01-01 
Dominican Republic                              69     1951-01-01     2019-01-01 
Ecuador                                         70     1950-01-01     2019-01-01 
Egypt                                           70     1950-01-01     2019-01-01 
Estonia                                         30     1990-01-01     2019-01-01 
Eswatini                                        40     1980-01-01     2019-01-01 
Fiji                                            40     1980-01-01     2019-01-01 
Finland                                         70     1950-01-01     2019-01-01 
France                                          70     1950-01-01     2019-01-01 
Gabon                                           60     1960-01-01     2019-01-01 
Georgia                                         30     1990-01-01     2019-01-01 
Germany                                         70     1950-01-01     2019-01-01 
Greece                                          69     1951-01-01     2019-01-01 
Guatemala                                       70     1950-01-01     2019-01-01 
Guinea                                          40     1980-01-01     2019-01-01 
Honduras                                        50     1970-01-01     2019-01-01 
Hungary                                         50     1970-01-01     2019-01-01 
Iceland                                         70     1950-01-01     2019-01-01 
India                                           70     1950-01-01     2019-01-01 
Indonesia                                       60     1960-01-01     2019-01-01 
Iran (Islamic Republic of)                      65     1955-01-01     2019-01-01 
Iraq                                            50     1970-01-01     2019-01-01 
Ireland                                         70     1950-01-01     2019-01-01 
Israel                                          70     1950-01-01     2019-01-01 
Italy                                           70     1950-01-01     2019-01-01 
Jamaica                                         67     1953-01-01     2019-01-01 
Japan                                           70     1950-01-01     2019-01-01 
Jordan                                          66     1954-01-01     2019-01-01 
Kazakhstan                                      30     1990-01-01     2019-01-01 
Kenya                                           70     1950-01-01     2019-01-01 
Kuwait                                          50     1970-01-01     2019-01-01 
Kyrgyzstan                                      30     1990-01-01     2019-01-01 
Lao People's DR                                 40     1980-01-01     2019-01-01 
Latvia                                          30     1990-01-01     2019-01-01 
Lebanon                                         50     1970-01-01     2019-01-01 
Lesotho                                         40     1980-01-01     2019-01-01 
Lithuania                                       30     1990-01-01     2019-01-01 
Luxembourg                                      70     1950-01-01     2019-01-01 
Malaysia                                        65     1955-01-01     2019-01-01 
Malta                                           66     1954-01-01     2019-01-01 
Mauritania                                      43     1977-01-01     2019-01-01 
Mauritius                                       70     1950-01-01     2019-01-01 
Mexico                                          70     1950-01-01     2019-01-01 
Mongolia                                        40     1980-01-01     2019-01-01 
Morocco                                         70     1950-01-01     2019-01-01 
Mozambique                                      60     1960-01-01     2019-01-01 
Namibia                                         60     1960-01-01     2019-01-01 
Netherlands                                     70     1950-01-01     2019-01-01 
New Zealand                                     70     1950-01-01     2019-01-01 
Nicaragua                                       40     1980-01-01     2019-01-01 
Niger                                           60     1960-01-01     2019-01-01 
Nigeria                                         70     1950-01-01     2019-01-01 
North Macedonia                                 30     1990-01-01     2019-01-01 
Norway                                          70     1950-01-01     2019-01-01 
Oman                                            50     1970-01-01     2019-01-01 
Panama                                          51     1969-01-01     2019-01-01 
Paraguay                                        69     1951-01-01     2019-01-01 
Peru                                            70     1950-01-01     2019-01-01 
Philippines                                     70     1950-01-01     2019-01-01 
Poland                                          50     1970-01-01     2019-01-01 
Portugal                                        70     1950-01-01     2019-01-01 
Qatar                                           50     1970-01-01     2019-01-01 
Republic of Korea                               67     1953-01-01     2019-01-01 
Republic of Moldova                             30     1990-01-01     2019-01-01 
Romania                                         60     1960-01-01     2019-01-01 
Russian Federation                              30     1990-01-01     2019-01-01 
Rwanda                                          60     1960-01-01     2019-01-01 
Sao Tome and Principe                           40     1980-01-01     2019-01-01 
Saudi Arabia                                    50     1970-01-01     2019-01-01 
Senegal                                         60     1960-01-01     2019-01-01 
Serbia                                          30     1990-01-01     2019-01-01 
Sierra Leone                                    40     1980-01-01     2019-01-01 
Singapore                                       60     1960-01-01     2019-01-01 
Slovakia                                        30     1990-01-01     2019-01-01 
Slovenia                                        30     1990-01-01     2019-01-01 
South Africa                                    70     1950-01-01     2019-01-01 
Spain                                           70     1950-01-01     2019-01-01 
Sri Lanka                                       70     1950-01-01     2019-01-01 
Sudan                                           50     1970-01-01     2019-01-01 
Suriname                                        47     1973-01-01     2019-01-01 
Sweden                                          70     1950-01-01     2019-01-01 
Switzerland                                     70     1950-01-01     2019-01-01 
Taiwan                                          69     1951-01-01     2019-01-01 
Tajikistan                                      30     1990-01-01     2019-01-01 
Thailand                                        70     1950-01-01     2019-01-01 
Togo                                            40     1980-01-01     2019-01-01 
Trinidad and Tobago                             70     1950-01-01     2019-01-01 
Tunisia                                         60     1960-01-01     2019-01-01 
Turkey                                          70     1950-01-01     2019-01-01 
U.R. of Tanzania: Mainland                      60     1960-01-01     2019-01-01 
Ukraine                                         30     1990-01-01     2019-01-01 
United Kingdom                                  70     1950-01-01     2019-01-01 
United States                                   70     1950-01-01     2019-01-01 
Uruguay                                         70     1950-01-01     2019-01-01 
Uzbekistan                                      30     1990-01-01     2019-01-01 
Venezuela (Bolivarian Republic of)              70     1950-01-01     2019-01-01 
Zambia                                          65     1955-01-01     2019-01-01 
Zimbabwe                                        66     1954-01-01     2019-01-01 
================================================================================
</pre>
</div>
<p><br>
The <code>pdSize</code> procedure provides a nice summary of our panel data structure including the:</p>
<ul>
<li>Total number of groups and a full list of the groups. </li>
<li>Number of observations per a group. </li>
<li>Number of missing values. </li>
<li>The start and end date of each group in our panel. </li>
</ul>
<p>While there are no missing values in this data, this isn't always the case. In fact, it is quite common that variables cover only part of the full timespan. For example, a country may have a longer history of providing real GDP data than IRR data. </p>
<p>The <code>pdTimeSpans</code> procedure reports the full timespan for each group, along with the timespans for a specified variable list. If no variable list is provided, it returns the timespan for all variables in the dataframe. </p>
<p>For example, suppose we want to use the <em>emp</em> and <em>rgdpo</em> variables in a model and want to know the maximum timespan our model can cover. We can use <code>pdTimeSpans</code>  to see the timespan of each variable:</p>
<pre class="hljs-container hljs-container-solo"><code class="lang-gauss">pwt_model_timespans = pdTimeSpans(pwt_10, "emp"$|"rgdpo");
pwt_model_timespans;</code></pre>
<div style="max-height: 600px; overflow-y: scroll; border: 1px solid #ddd; padding: 10px;">

<pre>
         country       Start year         End year        emp Start          emp End      rgdpo Start        rgdpo End 
          Angola       1970-01-01       2019-01-01       1970-01-01       2019-01-01       1970-01-01       2019-01-01 
       Argentina       1950-01-01       2019-01-01       1950-01-01       2019-01-01       1950-01-01       2019-01-01 
         Armenia       1990-01-01       2019-01-01       1990-01-01       2019-01-01       1990-01-01       2019-01-01 
           Aruba       1991-01-01       2019-01-01       1991-01-01       2019-01-01       1991-01-01       2019-01-01 
       Australia       1950-01-01       2019-01-01       1950-01-01       2019-01-01       1950-01-01       2019-01-01 
         Austria       1950-01-01       2019-01-01       1950-01-01       2019-01-01       1950-01-01       2019-01-01 
      Azerbaijan       1990-01-01       2019-01-01       1990-01-01       2019-01-01       1990-01-01       2019-01-01 
         Bahamas       1973-01-01       2019-01-01       1973-01-01       2019-01-01       1973-01-01       2019-01-01 
         Bahrain       1970-01-01       2019-01-01       1970-01-01       2019-01-01       1970-01-01       2019-01-01 
        Barbados       1960-01-01       2019-01-01       1960-01-01       2019-01-01       1960-01-01       2019-01-01 
         Belarus       1990-01-01       2019-01-01       1990-01-01       2019-01-01       1990-01-01       2019-01-01 
         Belgium       1950-01-01       2019-01-01       1950-01-01       2019-01-01       1950-01-01       2019-01-01 
           Benin       1980-01-01       2019-01-01       1980-01-01       2019-01-01       1980-01-01       2019-01-01 
         Bermuda       1986-01-01       2019-01-01       1986-01-01       2019-01-01       1986-01-01       2019-01-01 
Bolivia (Plurina       1950-01-01       2019-01-01       1950-01-01       2019-01-01       1950-01-01       2019-01-01 
Bosnia and Herze       1990-01-01       2019-01-01       1990-01-01       2019-01-01       1990-01-01       2019-01-01 
        Botswana       1960-01-01       2019-01-01       1960-01-01       2019-01-01       1960-01-01       2019-01-01 
          Brazil       1950-01-01       2019-01-01       1950-01-01       2019-01-01       1950-01-01       2019-01-01 
British Virgin I       1991-01-01       2019-01-01       1991-01-01       2019-01-01       1991-01-01       2019-01-01 
        Bulgaria       1970-01-01       2019-01-01       1970-01-01       2019-01-01       1970-01-01       2019-01-01 
    Burkina Faso       1959-01-01       2019-01-01       1959-01-01       2019-01-01       1959-01-01       2019-01-01 
         Burundi       1980-01-01       2019-01-01       1980-01-01       2019-01-01       1980-01-01       2019-01-01 
      Cabo Verde       1980-01-01       2019-01-01       1980-01-01       2019-01-01       1980-01-01       2019-01-01 
        Cameroon       1960-01-01       2019-01-01       1960-01-01       2019-01-01       1960-01-01       2019-01-01 
          Canada       1950-01-01       2019-01-01       1950-01-01       2019-01-01       1950-01-01       2019-01-01 
  Cayman Islands       1991-01-01       2019-01-01       1991-01-01       2019-01-01       1991-01-01       2019-01-01 
Central African        1980-01-01       2019-01-01       1980-01-01       2019-01-01       1980-01-01       2019-01-01 
            Chad       1960-01-01       2019-01-01       1960-01-01       2019-01-01       1960-01-01       2019-01-01 
           Chile       1951-01-01       2019-01-01       1951-01-01       2019-01-01       1951-01-01       2019-01-01 
           China       1952-01-01       2019-01-01       1952-01-01       2019-01-01       1952-01-01       2019-01-01 
China, Hong Kong       1960-01-01       2019-01-01       1960-01-01       2019-01-01       1960-01-01       2019-01-01 
China, Macao SAR       1980-01-01       2019-01-01       1980-01-01       2019-01-01       1980-01-01       2019-01-01 
        Colombia       1950-01-01       2019-01-01       1950-01-01       2019-01-01       1950-01-01       2019-01-01 
      Costa Rica       1950-01-01       2019-01-01       1950-01-01       2019-01-01       1950-01-01       2019-01-01 
         Croatia       1990-01-01       2019-01-01       1990-01-01       2019-01-01       1990-01-01       2019-01-01 
          Cyprus       1950-01-01       2019-01-01       1950-01-01       2019-01-01       1950-01-01       2019-01-01 
  Czech Republic       1990-01-01       2019-01-01       1990-01-01       2019-01-01       1990-01-01       2019-01-01 
   Côte d'Ivoire       1960-01-01       2019-01-01       1960-01-01       2019-01-01       1960-01-01       2019-01-01 
         Denmark       1950-01-01       2019-01-01       1950-01-01       2019-01-01       1950-01-01       2019-01-01 
        Djibouti       1980-01-01       2019-01-01       1980-01-01       2019-01-01       1980-01-01       2019-01-01 
Dominican Republ       1951-01-01       2019-01-01       1951-01-01       2019-01-01       1951-01-01       2019-01-01 
         Ecuador       1950-01-01       2019-01-01       1950-01-01       2019-01-01       1950-01-01       2019-01-01 
           Egypt       1950-01-01       2019-01-01       1950-01-01       2019-01-01       1950-01-01       2019-01-01 
         Estonia       1990-01-01       2019-01-01       1990-01-01       2019-01-01       1990-01-01       2019-01-01 
        Eswatini       1980-01-01       2019-01-01       1980-01-01       2019-01-01       1980-01-01       2019-01-01 
            Fiji       1980-01-01       2019-01-01       1980-01-01       2019-01-01       1980-01-01       2019-01-01 
         Finland       1950-01-01       2019-01-01       1950-01-01       2019-01-01       1950-01-01       2019-01-01 
          France       1950-01-01       2019-01-01       1950-01-01       2019-01-01       1950-01-01       2019-01-01 
           Gabon       1960-01-01       2019-01-01       1960-01-01       2019-01-01       1960-01-01       2019-01-01 
         Georgia       1990-01-01       2019-01-01       1990-01-01       2019-01-01       1990-01-01       2019-01-01 
         Germany       1950-01-01       2019-01-01       1950-01-01       2019-01-01       1950-01-01       2019-01-01 
          Greece       1951-01-01       2019-01-01       1951-01-01       2019-01-01       1951-01-01       2019-01-01 
       Guatemala       1950-01-01       2019-01-01       1950-01-01       2019-01-01       1950-01-01       2019-01-01 
          Guinea       1980-01-01       2019-01-01       1980-01-01       2019-01-01       1980-01-01       2019-01-01 
        Honduras       1970-01-01       2019-01-01       1970-01-01       2019-01-01       1970-01-01       2019-01-01 
         Hungary       1970-01-01       2019-01-01       1970-01-01       2019-01-01       1970-01-01       2019-01-01 
         Iceland       1950-01-01       2019-01-01       1950-01-01       2019-01-01       1950-01-01       2019-01-01 
           India       1950-01-01       2019-01-01       1950-01-01       2019-01-01       1950-01-01       2019-01-01 
       Indonesia       1960-01-01       2019-01-01       1960-01-01       2019-01-01       1960-01-01       2019-01-01 
Iran (Islamic Re       1955-01-01       2019-01-01       1955-01-01       2019-01-01       1955-01-01       2019-01-01 
            Iraq       1970-01-01       2019-01-01       1970-01-01       2019-01-01       1970-01-01       2019-01-01 
         Ireland       1950-01-01       2019-01-01       1950-01-01       2019-01-01       1950-01-01       2019-01-01 
          Israel       1950-01-01       2019-01-01       1950-01-01       2019-01-01       1950-01-01       2019-01-01 
           Italy       1950-01-01       2019-01-01       1950-01-01       2019-01-01       1950-01-01       2019-01-01 
         Jamaica       1953-01-01       2019-01-01       1953-01-01       2019-01-01       1953-01-01       2019-01-01 
           Japan       1950-01-01       2019-01-01       1950-01-01       2019-01-01       1950-01-01       2019-01-01 
          Jordan       1954-01-01       2019-01-01       1954-01-01       2019-01-01       1954-01-01       2019-01-01 
      Kazakhstan       1990-01-01       2019-01-01       1990-01-01       2019-01-01       1990-01-01       2019-01-01 
           Kenya       1950-01-01       2019-01-01       1950-01-01       2019-01-01       1950-01-01       2019-01-01 
          Kuwait       1970-01-01       2019-01-01       1970-01-01       2019-01-01       1970-01-01       2019-01-01 
      Kyrgyzstan       1990-01-01       2019-01-01       1990-01-01       2019-01-01       1990-01-01       2019-01-01 
 Lao People's DR       1980-01-01       2019-01-01       1980-01-01       2019-01-01       1980-01-01       2019-01-01 
          Latvia       1990-01-01       2019-01-01       1990-01-01       2019-01-01       1990-01-01       2019-01-01 
         Lebanon       1970-01-01       2019-01-01       1970-01-01       2019-01-01       1970-01-01       2019-01-01 
         Lesotho       1980-01-01       2019-01-01       1980-01-01       2019-01-01       1980-01-01       2019-01-01 
       Lithuania       1990-01-01       2019-01-01       1990-01-01       2019-01-01       1990-01-01       2019-01-01 
      Luxembourg       1950-01-01       2019-01-01       1950-01-01       2019-01-01       1950-01-01       2019-01-01 
        Malaysia       1955-01-01       2019-01-01       1955-01-01       2019-01-01       1955-01-01       2019-01-01 
           Malta       1954-01-01       2019-01-01       1954-01-01       2019-01-01       1954-01-01       2019-01-01 
      Mauritania       1977-01-01       2019-01-01       1977-01-01       2019-01-01       1977-01-01       2019-01-01 
       Mauritius       1950-01-01       2019-01-01       1950-01-01       2019-01-01       1950-01-01       2019-01-01 
          Mexico       1950-01-01       2019-01-01       1950-01-01       2019-01-01       1950-01-01       2019-01-01 
        Mongolia       1980-01-01       2019-01-01       1980-01-01       2019-01-01       1980-01-01       2019-01-01 
         Morocco       1950-01-01       2019-01-01       1950-01-01       2019-01-01       1950-01-01       2019-01-01 
      Mozambique       1960-01-01       2019-01-01       1960-01-01       2019-01-01       1960-01-01       2019-01-01 
         Namibia       1960-01-01       2019-01-01       1960-01-01       2019-01-01       1960-01-01       2019-01-01 
     Netherlands       1950-01-01       2019-01-01       1950-01-01       2019-01-01       1950-01-01       2019-01-01 
     New Zealand       1950-01-01       2019-01-01       1950-01-01       2019-01-01       1950-01-01       2019-01-01 
       Nicaragua       1980-01-01       2019-01-01       1980-01-01       2019-01-01       1980-01-01       2019-01-01 
           Niger       1960-01-01       2019-01-01       1960-01-01       2019-01-01       1960-01-01       2019-01-01 
         Nigeria       1950-01-01       2019-01-01       1950-01-01       2019-01-01       1950-01-01       2019-01-01 
 North Macedonia       1990-01-01       2019-01-01       1990-01-01       2019-01-01       1990-01-01       2019-01-01 
          Norway       1950-01-01       2019-01-01       1950-01-01       2019-01-01       1950-01-01       2019-01-01 
            Oman       1970-01-01       2019-01-01       1970-01-01       2019-01-01       1970-01-01       2019-01-01 
          Panama       1969-01-01       2019-01-01       1969-01-01       2019-01-01       1969-01-01       2019-01-01 
        Paraguay       1951-01-01       2019-01-01       1951-01-01       2019-01-01       1951-01-01       2019-01-01 
            Peru       1950-01-01       2019-01-01       1950-01-01       2019-01-01       1950-01-01       2019-01-01 
     Philippines       1950-01-01       2019-01-01       1950-01-01       2019-01-01       1950-01-01       2019-01-01 
          Poland       1970-01-01       2019-01-01       1970-01-01       2019-01-01       1970-01-01       2019-01-01 
        Portugal       1950-01-01       2019-01-01       1950-01-01       2019-01-01       1950-01-01       2019-01-01 
           Qatar       1970-01-01       2019-01-01       1970-01-01       2019-01-01       1970-01-01       2019-01-01 
Republic of Kore       1953-01-01       2019-01-01       1953-01-01       2019-01-01       1953-01-01       2019-01-01 
Republic of Mold       1990-01-01       2019-01-01       1990-01-01       2019-01-01       1990-01-01       2019-01-01 
         Romania       1960-01-01       2019-01-01       1960-01-01       2019-01-01       1960-01-01       2019-01-01 
Russian Federati       1990-01-01       2019-01-01       1990-01-01       2019-01-01       1990-01-01       2019-01-01 
          Rwanda       1960-01-01       2019-01-01       1960-01-01       2019-01-01       1960-01-01       2019-01-01 
Sao Tome and Pri       1980-01-01       2019-01-01       1980-01-01       2019-01-01       1980-01-01       2019-01-01 
    Saudi Arabia       1970-01-01       2019-01-01       1970-01-01       2019-01-01       1970-01-01       2019-01-01 
         Senegal       1960-01-01       2019-01-01       1960-01-01       2019-01-01       1960-01-01       2019-01-01 
          Serbia       1990-01-01       2019-01-01       1990-01-01       2019-01-01       1990-01-01       2019-01-01 
    Sierra Leone       1980-01-01       2019-01-01       1980-01-01       2019-01-01       1980-01-01       2019-01-01 
       Singapore       1960-01-01       2019-01-01       1960-01-01       2019-01-01       1960-01-01       2019-01-01 
        Slovakia       1990-01-01       2019-01-01       1990-01-01       2019-01-01       1990-01-01       2019-01-01 
        Slovenia       1990-01-01       2019-01-01       1990-01-01       2019-01-01       1990-01-01       2019-01-01 
    South Africa       1950-01-01       2019-01-01       1950-01-01       2019-01-01       1950-01-01       2019-01-01 
           Spain       1950-01-01       2019-01-01       1950-01-01       2019-01-01       1950-01-01       2019-01-01 
       Sri Lanka       1950-01-01       2019-01-01       1950-01-01       2019-01-01       1950-01-01       2019-01-01 
           Sudan       1970-01-01       2019-01-01       1970-01-01       2019-01-01       1970-01-01       2019-01-01 
        Suriname       1973-01-01       2019-01-01       1973-01-01       2019-01-01       1973-01-01       2019-01-01 
          Sweden       1950-01-01       2019-01-01       1950-01-01       2019-01-01       1950-01-01       2019-01-01 
     Switzerland       1950-01-01       2019-01-01       1950-01-01       2019-01-01       1950-01-01       2019-01-01 
          Taiwan       1951-01-01       2019-01-01       1951-01-01       2019-01-01       1951-01-01       2019-01-01 
      Tajikistan       1990-01-01       2019-01-01       1990-01-01       2019-01-01       1990-01-01       2019-01-01 
        Thailand       1950-01-01       2019-01-01       1950-01-01       2019-01-01       1950-01-01       2019-01-01 
            Togo       1980-01-01       2019-01-01       1980-01-01       2019-01-01       1980-01-01       2019-01-01 
Trinidad and Tob       1950-01-01       2019-01-01       1950-01-01       2019-01-01       1950-01-01       2019-01-01 
         Tunisia       1960-01-01       2019-01-01       1960-01-01       2019-01-01       1960-01-01       2019-01-01 
          Turkey       1950-01-01       2019-01-01       1950-01-01       2019-01-01       1950-01-01       2019-01-01 
U.R. of Tanzania       1960-01-01       2019-01-01       1960-01-01       2019-01-01       1960-01-01       2019-01-01 
         Ukraine       1990-01-01       2019-01-01       1990-01-01       2019-01-01       1990-01-01       2019-01-01 
  United Kingdom       1950-01-01       2019-01-01       1950-01-01       2019-01-01       1950-01-01       2019-01-01 
   United States       1950-01-01       2019-01-01       1950-01-01       2019-01-01       1950-01-01       2019-01-01 
         Uruguay       1950-01-01       2019-01-01       1950-01-01       2019-01-01       1950-01-01       2019-01-01 
      Uzbekistan       1990-01-01       2019-01-01       1990-01-01       2019-01-01       1990-01-01       2019-01-01 
Venezuela (Boliv       1950-01-01       2019-01-01       1950-01-01       2019-01-01       1950-01-01       2019-01-01 
          Zambia       1955-01-01       2019-01-01       1955-01-01       2019-01-01       1955-01-01       2019-01-01 
        Zimbabwe       1954-01-01       2019-01-01       1954-01-01       2019-01-01       1954-01-01       2019-01-01 
</pre>
</div>
<p><br></p>
<p>Again, because we aren't missing any data, both <em>emp</em> and <em>rgdpo</em> cover the full timespan for each group as reported by <code>pdSize</code>.</p>
<hr>
<div style="text-align:center">Ready to elevate your research? <a href="https://www.aptech.com/request-demo/" target="_blank" rel="noopener">Try GAUSS 25 today.</a></div>
<hr>
<h2 id="panel-data-summary-statistics">Panel Data Summary Statistics</h2>
<p>When analyzing panel data, it's important to understand how variability is distributed across different dimensions of the data. Specifically:  </p>
<ul>
<li><b>Overall statistics</b> which summarize the variability across all observations in the dataset, providing a high-level view of the data. </li>
<li><b>Within-group statistics</b> which measure variability within each individual group, reflecting how a variable changes over time for a specific group. </li>
<li><b>Between-group statistics</b>, which capture variability across groups, showing how groups differ from each other on average.</li>
</ul>
<p>Understanding these patterns ensures that we select the right modeling approach and properly account for both group-specific and overall trends in our analysis.</p>
<p>We'll use the <code>pdSummary</code> procedure to compute these statistics. However, to simplify our examples and output moving forward, let's limit our panel to include only countries that use the Euro. </p>
<pre class="hljs-container hljs-container-solo"><code class="lang-gauss">// Filter to include only Euro using countries
pwt_10 = selif(pwt_10, pwt_10[., "currency_unit"] .$== "Euro");

// Get summary statistics
pdSummary(pwt_10);</code></pre>
<pre>==========================================================================================
Group ID:                        country          Balanced:                             No
Valid cases:                        1125          Missings:                              0
N. Groups:                            19          T. Average:                       59.211
==========================================================================================
Variable               Measure           Mean      Std. Dev.        Minimum        Maximum
------------------------------------------------------------------------------------------
emp                    Overall          7.933         10.754          0.088         44.795
                       Between          -----         10.382          0.128         38.430
                        Within          -----          1.339          0.359         14.298
irr                    Overall          0.097          0.049          0.010          0.316
                       Between          -----          0.042          0.049          0.214
                        Within          -----          0.026          0.025          0.259
pop                    Overall         18.322         23.778          0.296         83.517
                       Between          -----         23.039          0.364         78.163
                        Within          -----          2.763          4.844         29.645
rgdpe                  Overall     457547.712     746910.097        568.248    4308861.500
                       Between          -----     594454.666       6365.400    2072470.938
                        Within          -----     431603.648   -1262654.069    2693938.274
rgdpo                  Overall     454655.015     750364.973         69.909    4275312.000
                       Between          -----     596209.636       5725.827    2097340.112
                        Within          -----     434813.497   -1283383.285    2632626.903
==========================================================================================
Non-numeric variables dropped from summary.</pre>
<p>One very clear observation from our summary table is that our GDP variables, <em>rgdpo</em> and <em>rgpde</em>, are a much different scale than our other variables. We'll look at how to transform these next.</p>
<h2 id="transforming-data-for-modeling">Transforming Data for Modeling</h2>
<p>Because panel data usually contains a time dimension, it is very common to need to take lags or differences of our data. While this is very straightforward with <a href="https://www.aptech.com/blog/getting-started-with-time-series-in-gauss/" target="_blank" rel="noopener">time series data</a>, doing this with panel data can be much more difficult. </p>
<p>Fortunately, the <code>pdLag</code> and <code>pdDiff</code> procedures, introduced in GAUSS 25, will efficiently compute panel data lags and differences for you. </p>
<p>$$\text{rdgpo growth rate} = \ln rgdpo_{t} - \ln rgdpo_{t-1} $$</p>
<p>Let's use the <code>pdDiff</code> procedure to create a new real GDP growth variable.</p>
<pre class="hljs-container hljs-container-solo"><code class="lang-gauss">// Take natural log of rgdpo
ln_rgdpo = ln(pwt_10[., "rgdpo"]);

// Add to pwt_10 dataframe
// we need to do this so GAUSS
// can identify or panel 
// using the 'country' and 'year' variables
pwt_10 = pwt_10 ~ asDF(ln_rgdpo, "ln_rgdpo");

// Take first difference of ln_rgdpo
// GAUSS will use 'country' and 'year' to 
// automatically detect panel
gr_rgdpo = pdDiff(pwt_10[., "country" "year" "ln_rgdpo"]);

// Summarize 'gr_rgdpo' 
// GAUSS will use 'country' and 'year' to 
// automatically detect panel
call pdSummary(gr_rgdpo);</code></pre>
<pre>==========================================================================================
Group ID:                        country          Balanced:                             No
Valid cases:                        1106          Missings:                             19
N. Groups:                            19          T. Average:                       58.211
==========================================================================================
Variable               Measure           Mean      Std. Dev.        Minimum        Maximum
------------------------------------------------------------------------------------------
ln_rgdpo               Overall          0.036          0.092         -1.741          1.476
                       Between          -----          0.013          0.008          0.062
                        Within          -----          0.091         -1.768          1.450
==========================================================================================</pre>
<h2 id="data-visualization">Data Visualization</h2>
<p>As a final step, let's create a quick visualization of this new variable using <a href="https://docs.aptech.com/gauss/plotxy.html" target="_blank" rel="noopener">plotXY</a>
and the <code>by</code> keyword. We'll use a subset of countries to keep our plot from getting to crowded.</p>
<pre class="hljs-container hljs-container-solo"><code class="lang-gauss">// Create subset of countries 
country_list = "Austria"$|"France"$|"Germany"$|"Spain"$|"Italy";

// Select data for plot
plot_data = selif(gr_rgdpo, sumr(gr_rgdpo[., "country"] .$== country_list'));

// Plot rgdpo growth variable by country
plotXY(plot_data, "ln_rgdpo~year + by(country)");</code></pre>
<p><a href="https://www.aptech.com/wp-content/uploads/2025/01/growth-plot.jpg"><img src="https://www.aptech.com/wp-content/uploads/2025/01/growth-plot.jpg" alt="Graph of the log of GDP growth for 5 countries in GAUSS." width="1200" height="740" class="aligncenter size-full wp-image-11584991" /></a></p>
<h3 id="conclusion">Conclusion</h3>
<p>Today we've seen how the new panel data tools in GAUSS 25 can simplify your everyday panel data tasks, using a hands-on example. We've covered fundamental tasks, including:</p>
<ul>
<li>Loading your data.</li>
<li>Preparing your panel dataset. </li>
<li>Exploring panel data characteristics. </li>
<li>Visualizing panel data. </li>
<li>Transforming your data for modeling. </li>
</ul>
<h3 id="further-reading">Further Reading</h3>
<ol>
<li><a href="https://www.aptech.com/blog/introduction-to-the-fundamentals-of-panel-data/" target="_blank" rel="noopener">Introduction to the Fundamentals of Panel Data</a></li>
<li><a href="https://www.aptech.com/blog/panel-data-basics-one-way-individual-effects/" target="_blank" rel="noopener">Panel Data Basics: One-way Individual Effects</a></li>
<li><a href="https://www.aptech.com/blog/get-started-with-panel-data-in-gauss-video/" target="_blank" rel="noopener">Get Started with Panel Data in GAUSS (Video)</a></li>
<li><a href="https://www.aptech.com/blog/how-to-aggregate-panel-data-in-gauss/" target="_blank" rel="noopener">How to Aggregate Panel Data in GAUSS</a></li>
</ol>

]]></content:encoded>
					
					<wfw:commentRss>https://www.aptech.com/blog/exploring-and-cleaning-panel-data-with-gauss-25/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
		<item>
		<title>Get Started with Panel Data in GAUSS (Video)</title>
		<link>https://www.aptech.com/blog/get-started-with-panel-data-in-gauss-video/</link>
					<comments>https://www.aptech.com/blog/get-started-with-panel-data-in-gauss-video/#respond</comments>
		
		<dc:creator><![CDATA[Eric]]></dc:creator>
		<pubDate>Wed, 17 Apr 2024 16:00:50 +0000</pubDate>
				<category><![CDATA[Econometrics]]></category>
		<category><![CDATA[Panel data]]></category>
		<category><![CDATA[Video]]></category>
		<guid isPermaLink="false">https://www.aptech.com/?p=11584500</guid>

					<description><![CDATA[In this video, you'll learn the basics of panel data analysis in GAUSS. We demonstrate panel data modeling start to finish, from loading data to running a group specific intercept model. ]]></description>
										<content:encoded><![CDATA[<iframe width="560" height="315" src="https://www.youtube.com/embed/b_TwmaVM5W4?si=4vHvm9y5T6H83nbl" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen></iframe>
<h3 id="introduction">Introduction</h3>
<p>In this video, you'll learn the basics of panel data analysis in GAUSS. We demonstrate panel data modeling start to finish, from loading data to running a group specific intercept model. </p>
<div class="alert alert-info" role="alert">This video is available, along with all GAUSS videos, on our <a href="https://www.youtube.com/@gauss5485" target="_blank" rel="noopener">GAUSS YouTube Channel</a>. Be sure to explore all our GAUSS videos and subscribe to the channel to get the latest videos as they are released. </div>
<h2 id="summary-and-timeline">Summary and Timeline</h2>
<p>You'll see firsthand how to:</p>
<ul>
<li>Load and verify panel data.</li>
<li>Merge data from different sources.</li>
<li>Convert between wide and long form panel data.</li>
<li>Explore and clean data.</li>
<li>Create panel data plots.</li>
<li>Prepare panel data for estimation.</li>
<li>Estimate a model with group-specific intercepts.</li>
</ul>
<h3 id="timeline">Timeline</h3>
<p><a href="https://www.youtube.com/watch?v=b_TwmaVM5W4&t=41s" target="_blank" rel="noopener">0:41</a> Set the current working directory.<br />
<a href="https://www.youtube.com/watch?v=b_TwmaVM5W4&t=63s" target="_blank" rel="noopener">1:03</a> Load panel data from an Excel file.<br />
<a href="https://www.youtube.com/watch?v=b_TwmaVM5W4&t=332s" target="_blank" rel="noopener">5:32</a> Merging data from different sources.<br />
<a href="https://www.youtube.com/watch?v=b_TwmaVM5W4&t=413s" target="_blank" rel="noopener">06:53</a> Preliminary data cleaning.<br />
<a href="https://www.youtube.com/watch?v=b_TwmaVM5W4&t=520s" target="_blank" rel="noopener">08:40</a> Panel data plots.<br />
<a href="https://www.youtube.com/watch?v=b_TwmaVM5W4&t=672s" target="_blank" rel="noopener">11:12</a> Stationarity testing.<br />
<a href="https://www.youtube.com/watch?v=b_TwmaVM5W4&t=716s" target="_blank" rel="noopener">11:56</a> Convert long form to wide form panel data.<br />
<a href="https://www.youtube.com/watch?v=b_TwmaVM5W4&t=889s" target="_blank" rel="noopener">14:49</a> Estimate a model with group-specific intercepts.  </p>
<h3 id="additional-resources">Additional Resources</h3>
<ol>
<li><a href="https://www.aptech.com/blog/how-to-load-excel-data-into-gauss/" target="_blank" rel="noopener">How to Load Excel Data Into GAUSS</a>  </li>
<li><a href="https://www.aptech.com/blog/transforming-panel-data-to-long-form-in-gauss/" target="_blank" rel="noopener">Transforming Panel Data to Long Form in GAUSS</a>  </li>
<li><a href="https://www.aptech.com/blog/visualizing-covid-19-panel-data-with-gauss-22/" target="_blank" rel="noopener">Visualizing COVID-19 Panel Data With GAUSS 22</a>  </li>
<li><a href="https://www.aptech.com/blog/what-is-a-gauss-dataframe-and-why-should-you-care/" target="_blank" rel="noopener">What is a GAUSS Dataframe and Why Should You Care?</a>  </li>
<li><a href="https://www.aptech.com/blog/managing-string-data-with-gauss-dataframes/" target="_blank" rel="noopener">Managing String Data with GAUSS Dataframes</a>  </li>
<li><a href="https://docs.aptech.com/gauss/data-management.html" target="_blank" rel="noopener">The GAUSS Data Management Guide</a></li>
<li><a href="https://www.aptech.com/blog/how-to-aggregate-panel-data-in-gauss/" target="_blank" rel="noopener">How to Aggregate Panel Data in GAUSS</a></li>
<li><a href="https://www.aptech.com/blog/panel-data-stationarity-test-with-structural-breaks/" target="_blank" rel="noopener">Panel Data Stationarity Test With Structural Breaks</a></li>
</ol>]]></content:encoded>
					
					<wfw:commentRss>https://www.aptech.com/blog/get-started-with-panel-data-in-gauss-video/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
		<item>
		<title>Transforming Panel Data to Long Form in GAUSS</title>
		<link>https://www.aptech.com/blog/transforming-panel-data-to-long-form-in-gauss/</link>
					<comments>https://www.aptech.com/blog/transforming-panel-data-to-long-form-in-gauss/#respond</comments>
		
		<dc:creator><![CDATA[Eric]]></dc:creator>
		<pubDate>Tue, 12 Dec 2023 21:24:59 +0000</pubDate>
				<category><![CDATA[Econometrics]]></category>
		<category><![CDATA[Panel data]]></category>
		<category><![CDATA[Programming]]></category>
		<guid isPermaLink="false">https://www.aptech.com/?p=11584134</guid>

					<description><![CDATA[Anyone who works with panel data knows that pivoting between long and wide form, though commonly necessary, can still be painstakingly tedious, at best. It can lead to frustrating errors, unexpected results, and lengthy troubleshooting, at worst.

<br>The new dfLonger and dfWider procedures introduced in GAUSS 24 make great strides towards fixing that. Extensive planning has gone into each procedure, resulting in comprehensive but intuitive functions.

<br>In today's blog, we will walk through all you need to know about the dfLonger procedure to tackle even the most complex cases of transforming wide form panel data to long form.]]></description>
										<content:encoded><![CDATA[<h3 id="introduction">Introduction</h3>
<p>Anyone who works with <a href="https://www.aptech.com/blog/introduction-to-the-fundamentals-of-panel-data/" target="_blank" rel="noopener">panel data</a> knows that pivoting between long and wide form, though commonly necessary, can still be painstakingly tedious, at best. It can lead to frustrating errors, unexpected results, and lengthy troubleshooting, at worst.</p>
<p>The new <a href="https://docs.aptech.com/gauss/dflonger.html" target="_blank" rel="noopener">dfLonger</a> and <a href="https://docs.aptech.com/gauss/dfwider.html" target="_blank" rel="noopener">dfWider</a> procedures introduced in <a href="https://www.aptech.com/blog/introducing-gauss-24/" target="_blank" rel="noopener">GAUSS 24</a> make great strides towards fixing that. Extensive planning has gone into each procedure, resulting in comprehensive but intuitive functions.</p>
<p>In today's blog, we will walk through all you need to know about the <code>dfLonger</code> <a href="https://www.aptech.com/blog/basics-of-gauss-procedures/" target="_blank" rel="noopener">procedure</a> to tackle even the most complex cases of transforming wide form panel data to long form. </p>
<h2 id="the-rules-of-tidy-data">The Rules of Tidy Data</h2>
<p>Before we get started, it will be useful to consider what makes data tidy (and why tidy data is important). </p>
<p>It's useful to think of breaking our data into components (these subsets will come in handy later when working with <code>dflonger</code>): </p>
<ul>
<li>Values.</li>
<li>Observations.</li>
<li>Variables.</li>
</ul>
<p><a href="https://www.aptech.com/wp-content/uploads/2023/12/Blank-diagram-2.jpg"><img src="https://www.aptech.com/wp-content/uploads/2023/12/Blank-diagram-2.jpg" alt="Components of data." width="757" height="887" class="aligncenter size-full wp-image-11584144" /></a></p>
<p>We can use these components to define some basic rules for tidy data:</p>
<ol>
<li>Variables have unique columns.</li>
<li>Observations have unique rows.</li>
<li>Values have unique cells.</li>
</ol>
<h3 id="example-one-wide-form-state-population-table">Example One: Wide Form State Population Table</h3>
<table>
 <thead>
<tr><th>State</th><th>2020</th><th>2021</th><th>2022</th></tr>
</thead>
<tbody>
<tr><td>Alabama</td><td>5,031,362</td><td>5,049,846</td><td>5,074,296</td></tr>
<tr><td>Alaska</td><td>732,923</td><td>734,182</td><td>733,583</td></tr>
<tr><td>Arizona</td><td>7,179,943</td><td>7,264,877</td><td>7,359,197</td></tr>
<tr><td>Arkansas</td><td>3,014,195</td><td>3,028,122</td><td>3,045,637</td></tr>
<tr><td>California</td><td>39,501,653</td><td>39,142,991</td><td>39,029,342</td></tr>
</tbody>
</table>
<p>Though not clearly labeled, we can deduce that this data presents values for three different variables: <em>State</em>, <em>Year</em>, and <em>Population</em>. </p>
<p>Looking more closely we see:</p>
<ul>
<li><em>State</em> is stored in a unique column. </li>
<li>The values of <em>Years</em> are stored as column names. </li>
<li>The values of <em>Population</em> are stored in separate columns for each year. </li>
</ul>
<p>Our variables do not each have a unique column, violating the rules of tidy data.</p>
<h3 id="example-two-long-form-state-population-table">Example Two: Long Form State Population Table</h3>
<table>
 <thead>
<tr><th>State</th><th>Year</th><th>Population </th></tr>
</thead>
<tbody>
<tr><td>Alabama</td><td>2020</td><td>5,031,362</td></tr>
<tr><td>Alabama</td><td>2021</td><td>5,049,846</td></tr>
<tr><td>Alabama</td><td>2022</td><td>5,074,296</td></tr>
<tr><td>Alaska</td><td>2020</td><td>732,923</td></tr>
<tr><td>Alaska</td><td>2021</td><td>734,182</td></tr>
<tr><td>Alaska</td><td>2022</td><td>733,583</td></tr>
<tr><td>Arizona</td><td>2020</td><td>7,179,943</td></tr>
<tr><td>Arizona</td><td>2021</td><td>7,264,877</td></tr>
<tr><td>Arizona</td><td>2022</td><td>7,359,197</td></tr>
</tbody>
</table>
<p>The transformed data above now has three columns, one for each variable <em>State</em>, <em>Year</em>, and <em>Population</em>. We can also confirm that each observation has a single row and each value has a single cell. </p>
<p>Transforming the data to long form has resulted in a tidy data table. </p>
<h3 id="why-do-we-care-about-tidy-data">Why Do We Care About Tidy Data?</h3>
<p>Working with tidy data offers a number of advantages:</p>
<ul>
<li>Tidy data storage offers consistency when trying to compare, explore, and analyze data whether it be panel data, <a href="https://www.aptech.com/blog/introduction-to-the-fundamentals-of-time-series-data-and-analysis/" target="_blank" rel="noopener">time series data</a> or cross-sectional data. </li>
<li>Using columns for variables is aligned with vectorization and <a href="https://www.aptech.com/blog/gauss-basics-3-introduction-to-matrices/" target="_blank" rel="noopener">matrix notation</a>, both of which are fundamental to efficient computations. </li>
<li>Many software tools expect tidy data and will only work reliably with tidy data. </li>
</ul>
<hr>
<div style="text-align:center">Ready to elevate your research? <a href="https://www.aptech.com/request-demo/" target="_blank" rel="noopener">Try GAUSS today.</a></div>
<hr>
<h2 id="transforming-from-wide-to-long-panel-data">Transforming From Wide to Long Panel Data</h2>
<p>In this section, we will look at how to use the GAUSS procedure <code>dfLonger</code> to transform panel data from wide to long form. This section will cover:</p>
<ul>
<li>The fundamentals of the <code>dfLonger</code> procedure.</li>
<li>A standard process for setting up panel data transformations.</li>
</ul>
<h3 id="the-dflonger-procedure">The <code>dfLonger</code> Procedure</h3>
<p>The <code>dfLonger</code> procedure transforms wide form GAUSS <a href="https://www.aptech.com/blog/what-is-a-gauss-dataframe-and-why-should-you-care/" target="_blank" rel="noopener">dataframes</a> to long form GAUSS dataframes. It has four required inputs and one <a href="https://www.aptech.com/blog/the-basics-of-optional-arguments-in-gauss-procedures/" target="_blank" rel="noopener">optional input</a>: </p>
<pre class="hljs-container hljs-container-solo"><code class="lang-gauss">df_long = dfLonger(df_wide, columns, names_to, values_to [, pctl]);</code></pre>
<hr>
<dl>
<dt>df_wide</dt>
<dd>A GAUSS dataframe in wide panel format.</dd>
<dt>columns</dt>
<dd>String array, the columns that should be used in the conversion.</dd>
<dt>names_to</dt>
<dd>String array, specifies the variable name(s) for the new column(s) created to store the wide variable names.</dd>
<dt>value_to</dt>
<dd>String, the name of the new column containing the values.</dd>
<dt>pctl</dt>
<dd>Optional, an instance of the <code>pivotControl</code> structure used for advanced pivoting options.
<hr></dd>
</dl>
<h3 id="setting-up-panel-data-transformations">Setting Up Panel Data Transformations</h3>
<p>Having a systematic process for transforming wide panel data to long panel data will:</p>
<ul>
<li>Save time.</li>
<li>Eliminate frustration.</li>
<li>Prevent errors. </li>
</ul>
<p>Let's use our wide form state population data to work through the steps.</p>
<h3 id="step-1-identify-variables">Step 1: Identify variables.</h3>
<p>In our wide form population table, there are three variables: <em>State</em>, <em>Year</em>, and <em>Population</em>. </p>
<div class="alert alert-info" role="alert">Variables are not always are clearly labeled in wide form data. You will often need to have background information to identify variables. Make sure to pay attention to references, titles, or other sources to ensure that you clearly understand the variables. </div>
<h3 id="step-2-identify-columns-to-convert">Step 2: Identify columns to convert.</h3>
<p>The easiest way to determine what columns need to be converted is to identify the &quot;problem&quot; columns in your wide form data.  </p>
<p>For example, in our original state population table, the columns named <em>2020</em>, <em>2021</em>, <em>2022</em>, represent our <em>Year</em> variable. They store the values for the <em>Population</em> variable. </p>
<p><a href="https://www.aptech.com/wp-content/uploads/2023/12/Blank-diagram-Page-1-1.jpg"><img src="https://www.aptech.com/wp-content/uploads/2023/12/Blank-diagram-Page-1-1.jpg" alt="" width="731" height="289" class="aligncenter size-full wp-image-11584149" /></a></p>
<p>These are the columns we will need to address in order to make our data tidy.</p>
<pre class="hljs-container hljs-container-solo"><code class="lang-gauss">columns = "2020"$|"2021"$|"2022";</code></pre>
<p>We only have three columns to transform and it is easy to just type out our column names in a string array. This won't always be the case, though. Fortunately, GAUSS has a lot of great convenience functions to help with creating your column lists.</p>
<p>My favorites include:</p>
<table>
 <thead>
<tr><th>Function</th><th>Description</th><th>Example</th></tr>
</thead>
<tbody>
<tr><td><a href="https://docs.aptech.com/gauss/getcolnames.html" target="_blank" rel="noopener">getColNames</a></td><td>Returns the column variable names.</td><td><code>
varnames = getColNames(df_wide)</code></td></tr>
<tr><td><a href="https://docs.aptech.com/gauss/startswith.html" target="_blank" rel="noopener">startsWith</a></td><td>Returns a 1 if a string starts with a specified pattern.</td><td><code>
mask = startsWith(colNames, pattern)</code></td></tr>
<tr><td><a href="https://docs.aptech.com/gauss/trimr.html" target="_blank" rel="noopener">trimr</a></td><td>Trims rows from the top and/or bottom of a matrix.</td><td><code>
names = trimr(full_list, top, bottom)</code></td></tr>
<tr><td><a href="https://docs.aptech.com/gauss/rowcontains.html" target="_blank" rel="noopener">rowcontains</a></td><td>Returns a 1 if the row contains the data specified by the <code>needle</code> variable, otherwise it returns a 0.</td><td><code>
mask = rowcontains(haystack, needle)</code></td></tr>
<tr><td><a href="https://docs.aptech.com/gauss/selif.html" target="_blank" rel="noopener">selif</a></td><td>Selects rows from a matrix, dataframe or string array, based upon a vector of 1’s and 0’s.</td><td><code>
names = rowcontains(full_list, mask)</code></td></tr>
</tbody>
</table>
<p>For more complex cases, it useful to approach creating column lists as a two-step process:</p>
<ol>
<li>Get all column names using <code>getColNames</code>.</li>
<li>Select a subset of columns names using a selection convenience functions. </li>
</ol>
<p>As an example, suppose our state population dataset contains a year column as the first column and the remaining columns contain the populations for 1950-2022. It would be difficult to write out the column list for all years. </p>
<p>Instead we could:</p>
<ol>
<li>Get a list of all the column names using <code>getColNames</code>.</li>
<li>Trim the first name off the list. </li>
</ol>
<pre class="hljs-container hljs-container-solo"><code class="lang-gauss">// Get all columns names
colNames = getColNames(pop_wide);

// Trim first name `year` 
// from top of the name list
colNames = trimr(colNames, 1, 0);</code></pre>
<h3 id="step-3-name-the-new-columns-for-storing-names">Step 3: Name the new columns for storing names.</h3>
<p>The names of the columns being transformed from our wide form data will be stored in a variable specified by the input <em>names_to</em>. </p>
<p>In this case, we want to store the names from the wide data in one new variable called, <code>"Years"</code>. In later examples, we will look at how to split names into multiple variables using prefixes, separators, or patterns.</p>
<pre class="hljs-container hljs-container-solo"><code class="lang-gauss">names_to = "Years";</code></pre>
<h3 id="step-4-name-the-new-columns-for-storing-values">Step 4: Name the new columns for storing values.</h3>
<p>The values stored in the columns being transformed will be stored in a variable specified by the input <em>values_to</em>.</p>
<p>For our population table, we will store the values in a variable named <code>"Population"</code>.</p>
<pre class="hljs-container hljs-container-solo"><code class="lang-gauss">values_to = "Population";</code></pre>
<h2 id="basic-pivoting">Basic Pivoting</h2>
<p>Now it's time to put all these steps together into a working example. Let's continue with our state population example. </p>
<p>We'll start by loading the <a href="https://github.com/aptech/gauss_blog/blob/master/econometrics/pivoting-to-long-form-12-6-23/state_pop.gdat" target="_blank" rel="noopener">complete state population dataset</a> from the <em>state_pop.gdat</em> file:</p>
<pre class="hljs-container hljs-container-solo"><code class="lang-gauss">// Load data 
pop_wide = loadd("state_pop.gdat");

// Preview data
head(pop_wide);</code></pre>
<pre>           State             2020             2021             2022
         Alabama        5031362.0        5049846.0        5074296.0
          Alaska        732923.00        734182.00        733583.00
         Arizona        7179943.0        7264877.0        7359197.0
        Arkansas        3014195.0        3028122.0        3045637.0
      California        39501653.        39142991.        39029342. </pre>
<p>Now, let's set up our information for transforming our data:</p>
<pre class="hljs-container hljs-container-solo"><code class="lang-gauss">// Identify columns
columns = "2020"$|"2021"$|"2022";

// Variable for storing names
names_to = "Year";

// Variable for storing values
values_to = "Population";</code></pre>
<p>Finally, we'll transform our data using <code>df_longer</code>:</p>
<pre class="hljs-container hljs-container-solo"><code class="lang-gauss">// Convert data using df_longer
pop_long = dfLonger(pop_wide, columns, names_to, values_to);

// Preview data
head(pop_long);</code></pre>
<pre>           State             Year       Population
         Alabama             2020        5031362.0
         Alabama             2021        5049846.0
         Alabama             2022        5074296.0
          Alaska             2020        732923.00
          Alaska             2021        734182.00 </pre>
<h2 id="advanced-pivoting">Advanced Pivoting</h2>
<p>One of the most appealing things about <code>dfLonger</code> is that while simple to use, it offers tools for tackling the most complex cases. In this section, we'll cover everything you need to know for moving beyond basic pivoting.</p>
<h3 id="the-pivotcontrol-structure">The <code>pivotControl</code> Structure</h3>
<p>The <code>pivotControl</code> structure allows you to control pivoting specifications using
the following members:</p>
<table>
 <thead>
<tr><th>Member</th><th>Purpose</th></tr>
</thead>
<tbody>
<tr><td>names_prefix</td><td>A string input which specifies which characters, if any, should be stripped from the front of the wide variable names before they are assigned to a long column.</td></tr>
<tr><td>names_sep_split</td><td>A string input which specifies which characters, if any, mark where the <i>names_to</i> names should be broken up.</td></tr>
<tr><td>names_pattern_split</td><td>A string input containing a regular expression specifying group(s) in <i>names_to</i> names which should be broken up.</td></tr>
<tr><td>names_types</td><td>A string input specifying data types for the <i>names_to</i> variable.</td></tr>
<tr><td>values_drop_missing</td><td>Scalar, is set to 1 all rows with missing values will be removed. 
</td></tr>
</tbody>
</table>
<div class="alert alert-info" role="alert">We will demonstrate more how to use the <code>pivotControl</code> structure in later examples. However, if you are unfamiliar with structures you may find it useful to review our tutorial, <a href="https://www.aptech.com/resources/tutorials/a-gentle-introduction-to-using-structures/" target="_blank" rel="noopener">&quot;A Gentle Introduction to Using Structures.&quot;</a></div>
<h3 id="changing-variable-types">Changing Variable Types</h3>
<p>By default the variables created from the pieces of the variable names will be <a href="https://www.aptech.com/blog/easy-management-of-categorical-variables/" target="_blank" rel="noopener">categorical variables</a>. </p>
<p>If we examine the variable type of <em>pop_long</em> from our previous example, </p>
<pre class="hljs-container hljs-container-solo"><code class="lang-gauss">// Check the type of the 'Year' variables
getColTypes(pop_long[., "Year"]);</code></pre>
<p>we can see that the <em>Year</em> variable is a categorical variable:</p>
<pre>            type
        category </pre>
<p>This isn't ideal and we'd prefer our <em>Year</em> variable to be a <a href="https://www.aptech.com/blog/dates-and-times-made-easy/" target="_blank" rel="noopener">date</a>.
We can control the assigned type using the <em>names_types</em> member of the <code>pivotControl</code> structure. The <em>names_types</em> member can be specified in one of two ways:</p>
<ol>
<li>As a column vector of types for each of the <em>names_to</em> variables.</li>
<li>An <em>n x 2</em> string array where the first column is the name of the variable(s) and the second column contains the type(s) to be assigned. </li>
</ol>
<p>For our example, we wish to specify that the <em>Year</em> variable should be a date but we don't need to change any of the other assigned types, so we will use the second option:</p>
<pre class="hljs-container hljs-container-solo"><code class="lang-gauss">// Declare pivotControl structure and fill with default values
struct pivotControl pctl;
pctl = pivotControlCreate();

// Specify that 'Year' should be
// converted to a date variable
pctl.names_types = {"Year" "date"};</code></pre>
<p>Next, we complete the steps for pivoting:</p>
<pre class="hljs-container hljs-container-solo"><code class="lang-gauss">// Get all column names and remove the first column, 'State'
columns = getColNames(pop_wide);
columns = trimr(columns, 1, 0);

// Variable for storing names
names_to = "Year";

// Variable for storing values
values_to = "Population";</code></pre>
<p>Finally, we call <code>dfLonger</code> including the <code>pivotControl</code> structure, <em>pctl</em>, as the final input:</p>
<pre class="hljs-container hljs-container-solo"><code class="lang-gauss">// Call dfLonger with optional control structure
pop_long = dfLonger(pop_wide, columns, names_to, values_to, pctl);

// Preview data
head(pop_long);</code></pre>
<pre>           State             Year       Population
         Alabama             2020        5031362.0
         Alabama             2021        5049846.0
         Alabama             2022        5074296.0
          Alaska             2020        732923.00
          Alaska             2021        734182.00</pre>
<p>Now if we check the type of our <em>Year</em> variable:</p>
<pre class="hljs-container hljs-container-solo"><code class="lang-gauss">// Check the type of 'Year'
getColTypes(pop_long[., "Year"]);</code></pre>
<p>It is a date variable:</p>
<pre>  type
  date</pre>
<h3 id="stripping-prefixes">Stripping Prefixes</h3>
<p>In our previous example, the wide data names only contained the year. However, the column names of a wide dataset often have common prefixes. The <em>names_prefix</em> member of the <code>pivotControl</code> structure offers a convenient way to strip unwanted prefixes. </p>
<p>Suppose that our wide form state population columns were labeled <code>"yr_2020"</code>, <code>"yr_2021"</code>, <code>"yr_2022"</code>:</p>
<pre class="hljs-container hljs-container-solo"><code class="lang-gauss">// Load data
pop_wide2 = loadd("state_pop2.gdat");

// Preview data
head(pop_wide2);</code></pre>
<pre>           State          yr_2020          yr_2021          yr_2022
         Alabama        5031362.0        5049846.0        5074296.0
          Alaska        732923.00        734182.00        733583.00
         Arizona        7179943.0        7264877.0        7359197.0
        Arkansas        3014195.0        3028122.0        3045637.0
      California        39501653.        39142991.        39029342.</pre>
<p>We need to strip these prefixes when transforming our data to long form. </p>
<p>To accomplish this we first need to specify that our name columns have the common prefix <code>"yr"</code>:</p>
<pre class="hljs-container hljs-container-solo"><code class="lang-gauss">// Declare pivotControl structure and fill with default values
struct pivotControl pctl;
pctl = pivotControlCreate();

// Specify prefix
pctl.names_prefix = "yr_";</code></pre>
<p>Next, we complete the steps for pivoting:</p>
<pre class="hljs-container hljs-container-solo"><code class="lang-gauss">// Get all column names and remove the first column, 'State'
columns = getColNames(pop_wide2);
columns = trimr(columns, 1, 0);

// Variable for storing names
names_to = "Year";

// Variable for storing values
values_to = "Population";</code></pre>
<p>Finally, we call <code>dfLonger</code>:</p>
<pre class="hljs-container hljs-container-solo"><code class="lang-gauss">// Call dfLonger with optional control structure
pop_long = dfLonger(pop_wide2, columns, names_to, values_to, pctl);

// Preview data
head(pop_long);</code></pre>
<pre>           State             Year       Population
         Alabama             2020        5031362.0
         Alabama             2021        5049846.0
         Alabama             2022        5074296.0
          Alaska             2020        732923.00
          Alaska             2021        734182.00</pre>
<h3 id="splitting-names">Splitting Names</h3>
<p>In our basic example the only information contained in the names columns was the year. We created one variable to store that information, <code>"Year"</code>. However, we may have cases where our wide form data contains more than one piece of information. </p>
<p>In theses case there are two important steps to take:</p>
<ol>
<li>Name the variables that will store the information contained in the wide data column names using the <em>names_to</em> input.</li>
<li>Indicate to GAUSS how to split the wide data column names into the <em>names_to</em> variables. </li>
</ol>
<h4 id="names-include-a-separator">Names Include a Separator</h4>
<p>One way that names in wide data can contain multiple pieces of information is through the use of separators. </p>
<p>For example, suppose our data looks like this:</p>
<pre>           State       urban_2020       urban_2021       urban_2022       rural_2020       rural_2021       rural_2022
         Alabama        6558153.0        4972982.0        12375977.        1526791.0        76863.000        7301681.0
          Alaska        21944.000        467051.00        311873.00        710978.00        267130.00        421709.00
         Arizona        1248007.0        6033358.0        1444029.0        8427950.0        1231518.0        5915167.0
        Arkansas        863918.00        913266.00        7000024.0        2150276.0        3941388.0        3954387.0
      California        17255657.        27682794.        63926200.        22245995.        11460196.        24896858. </pre>
<p>Now our names specify:</p>
<ul>
<li>Whether the population is the urban or rural population. </li>
<li>The year of the observation.</li>
</ul>
<p>In this case, we:</p>
<ul>
<li>Use the <em>names_sep_split</em> member of the <code>pivotControl</code> structure to indicate how to split the names. </li>
<li>Specify a <em>names_to</em> variable for each group created by the separator.</li>
</ul>
<pre class="hljs-container hljs-container-solo"><code class="lang-gauss">// Load data
pop_wide3 = loadd("state_pop3.gdat");

// Declare pivotControl structure and fill with default values
struct pivotControl pctl;
pctl = pivotControlCreate();

// Specify how to separate names
pctl.names_sep_split = "_";

// Specify two variables for holding
// names information:
//    'Location' for the information before the separator
//    'Year' for the information after the separator
names_to = "Location"$|"Year";

// Variable for storing values
values_to = "Population";

// Call dfLonger with optional control structure
pop_long = dfLonger(pop_wide3, columns, names_to, values_to, pctl);

// Preview data
head(pop_long);</code></pre>
<pre>           State         Location             Year       Population
         Alabama            urban             2020        6558153.0
         Alabama            urban             2021        4972982.0
         Alabama            urban             2022        12375977.
         Alabama            rural             2020        1526791.0
         Alabama            rural             2021        76863.000</pre>
<p>Now, the <em>pop_long</em> dataframe contains:</p>
<ul>
<li>The information in the wide form names found before the separator, <code>"_"</code>, (urban or rural) in the <em>Location</em> variable. </li>
<li>The information in the wide form names found after the separator, <code>"_"</code>, in the <em>Year</em> variable. </li>
</ul>
<h4 id="variable-names-with-regular-expressions">Variable Names With Regular Expressions</h4>
<p>In our example above, the variables contained in the names were clearly separated by a <code>"_"</code>. However, this isn't always the case. Sometimes names use a pattern rather than separator:</p>
<pre class="hljs-container hljs-container-solo"><code class="lang-gauss">// Load data
pop_wide4 = loadd("state_pop4.gdat");

// Preview data
head(pop_wide4);</code></pre>
<pre>           State        urban2020        urban2021        urban2022        rural2020        rural2021        rural2022
         Alabama        6558153.0        4972982.0        12375977.        1526791.0        76863.000        7301681.0
          Alaska        21944.000        467051.00        311873.00        710978.00        267130.00        421709.00
         Arizona        1248007.0        6033358.0        1444029.0        8427950.0        1231518.0        5915167.0
        Arkansas        863918.00        913266.00        7000024.0        2150276.0        3941388.0        3954387.0
      California        17255657.        27682794.        63926200.        22245995.        11460196.        24896858. </pre>
<p>In cases like this, we can use the <em>names_pattern_split</em> member to tell GAUSS we want to pass in a regular expression that will split the columns. We can't cover the full details of regular expressions here. However, there are a few fundamentals that will help us get started with this example. </p>
<p>In regEx:</p>
<ol>
<li>Each statement inside a pair of parentheses is a group. </li>
<li>To match any upper or lower case letter we use <code>"[a-zA-Z]"</code>. More specifically, this tells GAUSS that we want to match any lowercase letter ranging from a-z and any upper case letter ranging from A-Z. If we wanted to limit this to any lowercase letters from t to z and any uppercase letter B to M we would say <code>"[t-zB-M]"</code>.</li>
<li>To match any integer we use <code>"[0-9]"</code>.</li>
<li>To represent that we want to match <u>one or more</u> instances of a pattern we use <code>"+"</code>.</li>
<li>To represent that we want to match <u>zero or more</u> instances of a pattern we use <code>"*"</code>.</li>
</ol>
<p>In this case, we want to separate our names so that &quot;urban&quot; and &quot;rural&quot; are collected in <em>Location</em> and <em>2020</em>, <em>2021</em>, and <em>2022</em> are collected in the <em>Year</em> variable:</p>
<ol>
<li>We have two groups.</li>
<li>We can capture both <code>urban</code> and <code>rural</code> using <code>"[a-zA-Z]+"</code>.</li>
<li>We can capture the years by matching one or more number using <code>"[0-9]+"</code>.</li>
</ol>
<p>Let's use regEx to specify our <em>names_pattern_split</em> member:</p>
<pre class="hljs-container hljs-container-solo"><code class="lang-gauss">// Declare pivotControl structure and fill with default values
struct pivotControl pctl;
pctl = pivotControlCreate();

// Specify how to separate names 
// using the pivotControl structure
pctl.names_pattern_split = "([a-zA-Z]+)([0-9]+)"; </code></pre>
<p>Next, we can put this together with our other steps to transform our wide data:</p>
<pre class="hljs-container hljs-container-solo"><code class="lang-gauss">// Variable for storing names
names_to = "Location"$|"Year";

// Get all column names and remove the first column, 'State'
columns = getColNames(pop_wide4);
columns = trimr(columns, 1, 0);

// Variable for storing values
values_to = "Population";

// Call dfLonger with optional control structure
pop_long = dfLonger(pop_wide4, columns, names_to, values_to, pctl4);
head(pop_long);</code></pre>
<pre>           State         Location             Year       Population
         Alabama            urban             2020        6558153.0
         Alabama            urban             2021        4972982.0
         Alabama            urban             2022        12375977.
         Alabama            rural             2020        1526791.0
         Alabama            rural             2021        76863.000</pre>
<h3 id="multiple-value-variables">Multiple Value Variables</h3>
<p>In all our previous examples we had values that needed to be stored in one variable. However, it's more realistic that our dataset contains multiple groups of values and we will need to specify multiple variables to store these values. </p>
<p>Let's consider our previous example which used the <em>pop_wide4</em> dataset:</p>
<pre>           State        urban2020        urban2021        urban2022        rural2020        rural2021        rural2022
         Alabama        6558153.0        4972982.0        12375977.        1526791.0        76863.000        7301681.0
          Alaska        21944.000        467051.00        311873.00        710978.00        267130.00        421709.00
         Arizona        1248007.0        6033358.0        1444029.0        8427950.0        1231518.0        5915167.0
        Arkansas        863918.00        913266.00        7000024.0        2150276.0        3941388.0        3954387.0
      California        17255657.        27682794.        63926200.        22245995.        11460196.        24896858. </pre>
<p>Suppose that rather than creating a <em>location</em> variable, we wish to separate the population information into two variables, <em>urban</em> and <em>rural</em>. To do this we will:</p>
<ol>
<li>Split the variable names by words (<code>"urban"</code> or <code>"rural"</code>) and integers.</li>
<li>Create a <em>Year</em> column from the integer portions of the names.</li>
<li>Create two values columns, <em>urban</em> and <em>rural</em>, from the word portions. </li>
</ol>
<p>First, we will specify our columns:</p>
<pre class="hljs-container hljs-container-solo"><code class="lang-gauss">// Get all column names and remove the first column, 'State'
columns = getColNames(pop_wide4);
columns = trimr(columns, 1, 0);</code></pre>
<div class="alert alert-info" role="alert">Since we are using the same data as our previous example, we don't need to load any additional data.</div>
<p>Next, we need to specify our <em>names_to</em> and <em>values_to</em> inputs. However, this time we want our <em>values_to</em> variables to be determined by the information in our names. </p>
<p>We do this using <code>".value"</code>.</p>
<pre class="hljs-container hljs-container-solo"><code class="lang-gauss">// Tell GAUSS to use the first group of the split names 
// to set the values variables and 
// store the remaining group in 'Year'
names_to = ".value" $| "Year";

// Tell GAUSS to get 'values_to' variables from 'names_to'
values_to = "";</code></pre>
<p>Setting <code>".value"</code> as the first element in our <em>names_to</em> input tells <code>dfLonger</code> to take the first piece of the wide data names and create a column with the all the values from all matching columns.</p>
<p>In other words, combine all the values from the variables <em>urban2020</em>, <em>urban2021</em>, <em>urban2022</em> into a single variable named <em>urban</em> and do the same for the <em>rural</em> columns.</p>
<p>Finally, we need to tell GAUSS how to split the variable names. </p>
<pre class="hljs-container hljs-container-solo"><code class="lang-gauss">// Declare 'pctl' to be a pivotControl structure
// and fill with default settings
struct pivotControl pctl;
pctl = pivotControlCreate();

// Set the regex to split the variable names
pctl.names_pattern_split = "(urban|rural)([0-9]+)";</code></pre>
<p>This time, we specify the variable names, <code>"(urban|rural)"</code> rather than use the general specifier <code>"([a-zA-Z])"</code>.</p>
<p>Now we call <code>dfLonger</code>:</p>
<pre class="hljs-container hljs-container-solo"><code class="lang-gauss">// Convert the dataframe to long format according to our specifications
pop_long = dfLonger(pop_wide4, columns, names_to, values_to, pctl);

// Print the first 5 rows of the long form dataframe
head(pop_long);</code></pre>
<pre>           State             Year            urban            rural
         Alabama             2020        6558153.0        1526791.0
         Alabama             2021        4972982.0        76863.000
         Alabama             2022        12375977.        7301681.0
          Alaska             2020        21944.000        710978.00
          Alaska             2021        467051.00        267130.00</pre>
<p>Now the urban population and rural population are stored in their own column, named <em>urban</em> and <em>rural</em>. </p>
<div class="alert alert-info" role="alert">These names can easily be changed using the <b>Data Manager</b> or <a href="https://docs.aptech.com/gauss/setcolnames.html" target="_blank" rel="noopener">setColNames</a></div>
<h2 id="conclusion">Conclusion</h2>
<p>As we've seen today, pivoting panel data from wide to long can be complicated. However, using a systematic approach and the GAUSS <code>dfLonger</code> procedure help to alleviate the frustration, time, and errors.   </p>
<h3 id="further-reading">Further Reading</h3>
<ol>
<li><a href="https://www.aptech.com/blog/panel-data-structural-breaks-and-unit-root-testing/" target="_blank" rel="noopener">Panel data, structural breaks and unit root testing</a></li>
<li><a href="https://www.aptech.com/blog/panel-data-basics-one-way-individual-effects/" target="_blank" rel="noopener">Panel Data Basics: One-way Individual Effects</a></li>
<li><a href="https://www.aptech.com/blog/how-to-aggregate-panel-data-in-gauss/" target="_blank" rel="noopener">How to Aggregate Panel Data in GAUSS</a></li>
<li><a href="https://www.aptech.com/blog/introduction-to-the-fundamentals-of-panel-data/" target="_blank" rel="noopener">Introduction to the Fundamentals of Panel Data</a></li>
<li><a href="https://www.aptech.com/blog/panel-data-stationarity-test-with-structural-breaks/" target="_blank" rel="noopener">Panel Data Stationarity Test With Structural Breaks</a></li>
<li><a href="https://www.aptech.com/blog/get-started-with-panel-data-in-gauss-video/" target="_blank" rel="noopener">Getting Started With Panel Data in GAUSS </a></li>
</ol>
<div style="text-align:center;background-color:#455560;color:#FFFFFF">
<hr>
<h3 id="discover-how-gauss-24-can-help-you-reach-your-goals">Discover how GAUSS 24 can help you reach your goals.</h3>
 
<div class="lp-cta">
    <a href="https://www.aptech.com/request-demo" class="btn btn-primary">Request Demo</a>
    <a href="https://www.aptech.com/request-quote/" class="btn btn-primary btn-quote">Request pricing</a>
</div><hr>
</div>]]></content:encoded>
					
					<wfw:commentRss>https://www.aptech.com/blog/transforming-panel-data-to-long-form-in-gauss/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
		<item>
		<title>Visualizing COVID-19 Panel Data With GAUSS 22</title>
		<link>https://www.aptech.com/blog/visualizing-covid-19-panel-data-with-gauss-22/</link>
					<comments>https://www.aptech.com/blog/visualizing-covid-19-panel-data-with-gauss-22/#comments</comments>
		
		<dc:creator><![CDATA[Eric]]></dc:creator>
		<pubDate>Tue, 14 Dec 2021 18:57:02 +0000</pubDate>
				<category><![CDATA[Graphics]]></category>
		<category><![CDATA[Panel data]]></category>
		<category><![CDATA[#gauss22]]></category>
		<guid isPermaLink="false">https://www.aptech.com/?p=11582197</guid>

					<description><![CDATA[When they're done right, graphs are a useful tool for telling compelling data stories and supporting data models. However, too often graphs lack the right components to truly enhance understanding. 

In this blog, we look at how a few quick customizations help make graphs more impactful. In particular, we will consider:
<ul>
<li>Using grid lines without cluttering a graph. </li>
<li>Changing tick labels for readability. </li>
<li>Using clear axis labels. </li>
<li>Marking events and outcomes with lines, bars, and annotations. </li>
</ul>]]></description>
										<content:encoded><![CDATA[<h3 id="introduction">Introduction</h3>
<p>When they're done right, graphs are a useful tool for telling compelling data stories and supporting data models. However, too often graphs lack the right components to truly enhance understanding. </p>
<p>In this blog, we look at how a few quick customizations help make graphs more impactful. In particular, we will consider:</p>
<ul>
<li>Using grid lines without cluttering a graph. </li>
<li>Changing tick labels for readability. </li>
<li>Using clear axis labels. </li>
<li>Marking events and outcomes with lines, bars, and annotations. </li>
</ul>
<h2 id="data">Data</h2>
<p>As an example, we will use New York Times COVID tracking data <a href="https://github.com/nytimes/covid-19-data/tree/master/rolling-averages">(available on GitHub)</a>. This data is part of the <a href="https://www.nytimes.com/interactive/2020/us/coronavirus-us-cases.html">New York Times U.S. tracking project</a>. </p>
<p>From this data, we will be using the rolling 7-day average of COVID cases per 100k provided by date for five states: Arizona, California, Florida, Texas, and Washington. </p>
<h2 id="creating-a-basic-graph">Creating a Basic Graph</h2>
<p>Let's start by creating a basic panel data plot using:</p>
<ul>
<li>The <a href="https://docs.aptech.com/gauss/plotxy.html"><code>plotXY</code></a> procedure with dates. </li>
<li>A <a href="https://www.aptech.com/resources/tutorials/formula-string-syntax/">formula string</a> and the <code>by</code> keyword. </li>
</ul>
<p>First we will load our data: </p>
<pre class="hljs-container hljs-container-solo"><code class="lang-gauss">// Load original data
fname = "us_state_covid_cases.csv";
covid_cases = loadd(fname, 
                    "date($date) + cat(state) + cases + cases_avg_per_100k");

// Filter desired states
covid_cases = selif(covid_cases, 
                    rowcontains(covid_cases[., "state"], 
                                "Florida"$|"California"$|
                                "Arizona"$|"Washington"$|
                                "Texas"));</code></pre>
<p>Note that in this step we've:</p>
<ol>
<li>Specified the variables we want to load and their variable types.</li>
<li>Filtered our data to include only our states of interest. </li>
</ol>
<div class="alert alert-info" role="alert">For more information about loading data and other data management tips see our previous blog, <a href="https://www.aptech.com/blog/getting-to-know-your-data-with-gauss-22/">Getting to Know Your Data with GAUSS 22</a>.</div>
<p>Now, we can make a preliminary plot of the rolling 7 day average number of COVID-19 cases per 100,000 people:</p>
<pre class="hljs-container hljs-container-solo"><code class="lang-gauss">// Plot COVID cases per 100K by state
plotXY(covid_cases, "cases_avg_per_100k ~ date + by(state)");</code></pre>
<p><a href="https://www.aptech.com/wp-content/uploads/2021/12/covid-cases-basic.jpg"><img src="https://www.aptech.com/wp-content/uploads/2021/12/covid-cases-basic.jpg" alt="" width="753" height="566" class="alignnone size-full wp-image-11582241" /></a></p>
<div class="alert alert-info" role="alert">The <code>by</code> keyword tells GAUSS to split the data on a particular variable. It was introduced in GAUSS 22, as well as the capability to use <code>plotXY</code> with date variables.</div>
<h3 id="customizing-our-graph">Customizing Our Graph</h3>
<p>Our quick graph was a good starting point. However, a few customizations will help present a clearer picture:</p>
<ul>
<li>Adding y-axis grid lines will help us read COVID cases values more easily.</li>
<li>Reformatting our x-axis tick labels to include months rather than quarters will make the dates more recognizable. </li>
<li>Change axis labels. </li>
</ul>
<h3 id="declaring-a-plotcontrol-structure">Declaring a <code>plotControl</code> Structure</h3>
<p>The first step for customizing graphs is to declare a <code>plotControl</code> structure and to fill it with the appropriate defaults:</p>
<pre class="hljs-container hljs-container-solo"><code class="lang-gauss">// Declare plot control structure
struct plotControl myPlot;

// Fill with defaults for "xy" graph
myPlot = plotGetDefaults("xy");</code></pre>
<h3 id="customizing-plot-attributes">Customizing Plot Attributes</h3>
<p>After declaring the <code>plotControl</code> structure, we can use <code>plotSet</code> procedures to change the desired attributes of our graph. </p>
<h4 id="adding-y-axis-grid-lines">Adding Y-Axis Grid Lines</h4>
<p>First, to help make levels of COVID cases more clear, let's add y-axis grid lines to our plot using <a href="https://docs.aptech.com/gauss/plotsetygridpen.html"><code>plotSetYGridPen</code></a>.</p>
<p>The <code>plotSetYGridPen</code> procedure can be used to set the <em>width</em>, <em>color</em>, and <em>style</em> of the y-axis grid lines:</p>
<ul>
<li>Turn on y-axis major and/or minor grids.</li>
<li>Set the <em>width</em>, <em>color</em>, and <em>style</em> of the grid lines.</li>
</ul>
<table style="width: 100%">
  <colgroup>
       <col span="1" style="width: 20%;">
       <col span="1" style="width: 80%;">
    </colgroup>
<tr>
<th><b>Input</b></th><th>Description</th>
</tr>
<tr>
<td>which_grid</td><td>Specifies which grid line to modify. The options include: <code>"major"</code>, <code>"minor"</code>, or <code>"both"</code>.</td>
</tr>
<tr>
</tr><tr>
<td>width</td><td>Specifies the thickness of the line(s) in pixels. The default value is 1.</td>
</tr>
<tr>
<td>color</td><td>Optional argument, specifying the name or RGB value of the new color(s) for the line(s).</td>
</tr>
<tr>
<td>style</td><td>Optional argument, the style(s) of the pen for the line(s). <br>Options include: <table><tr><td>1</td><td>Solid line</td></tr><tr><td>2</td><td>Dash line</td></tr><tr><td>3</td><td>Dot line</td></tr><tr><td>4</td><td>Dash-Dot line</td></tr><tr><td>5</td><td>Dash-Dot-Dot line</td></tr></table></td>
</tr>
</table>
<pre class="hljs-container hljs-container-solo"><code class="lang-gauss">// Turn on y-axis grid for the major ticks. Set the
// grid lines to be solid, 1 pixel and light grey
plotSetYGridPen(&amp;myPlot, "major", 1, "Light Grey", 1);</code></pre>
<div class="alert alert-info" role="alert">When using any <code>plotSet</code> procedure, the first input is a pointer to a declared <code>plotControl</code> structure. We indicate that something is a pointer using the <code>&amp;</code> symbol.</div>
<p>Because GAUSS allows us to add and format y-axis and x-axis grid lines separately, we are able to improve readability with y-axis lines without adding the clutter of a full grid. </p>
<h4 id="customizing-x-axis-ticks">Customizing X-Axis Ticks</h4>
<p>Next, let's turn our attention to the x-axis ticks. We will use three GAUSS procedures to help us customize our ticks:</p>
<table style="width: 100%">
  <colgroup>
       <col span="1" style="width: 20%;">
       <col span="1" style="width: 80%;">
    </colgroup>
<tr>
<th><b>Procedure</b></th><th>Description</th>
</tr>
<tr>
<td><a href="https://docs.aptech.com/gauss/plotsetxticlabel.html"><code>plotSetXTicLabel</code></a></td><td>Controls the formatting and angle of x-axis tick labels for 2-D graphs.</td>
</tr>
<tr>
<td><a href="https://docs.aptech.com/gauss/plotsetxticinterval.html"><code>plotSetXTicInterval</code></a></td><td>Controls the interval between x-axis tick labels and also allows the user to specify the first tick to be labeled for 2-D graphs.</td>
</tr>
<tr>
<td><a href="https://docs.aptech.com/gauss/plotsetticlabelfont.html"><code>plotSetTicLabelFont</code></a></td><td>Controls the font name, size and color for the X and Y axis tick labels.</td>
</tr>
</table>
<p>First, let's change the format of the labels on the x-axis to indicate months rather than quarters:</p>
<pre class="hljs-container hljs-container-solo"><code class="lang-gauss">// Display 4 digit year and month on 'X' tick labels
plotSetXTicLabel(&amp;myPlot, "YYYY-MO");</code></pre>
<div class="alert alert-info" role="alert">A full list of supported x-axis tick label formats for time series data is available in the <strong>Remarks</strong> section of the <a href="https://docs.aptech.com/gauss/plotsetxticlabel.html">documentation for <code>plotSetXTicLabel</code></a>.</div>
<p>Second, let's set the x-axis ticks to:</p>
<ul>
<li>Start in March of 2020 to correspond with the start of the pandemic.</li>
<li>Occur every 3 months. </li>
</ul>
<pre class="hljs-container hljs-container-solo"><code class="lang-gauss">// Place first 'X' tick mark on March 1st, 2020
// with ticks occurring every 3 months
plotSetXTicInterval(&amp;myPlot, 3, "months", asDate("2020-03"));</code></pre>
<p>Third, let's increase the size of the axis tick labels:</p>
<pre class="hljs-container hljs-container-solo"><code class="lang-gauss">// Change tic label font size
plotSetTicLabelFont(&amp;myPlot, "Arial", 12); </code></pre>
<h4 id="updating-axis-labels">Updating Axis Labels</h4>
<p>Finally, we change the axis labels:</p>
<pre class="hljs-container hljs-container-solo"><code class="lang-gauss">// Specify the text for the Y-axis label as well as
// the font and font size for both labels
plotSetYLabel(&amp;myPlot, "Cases per 100k", "Arial", 14);

// Specify text for the x-axis label
plotSetXLabel(&amp;myPlot, "Date");</code></pre>
<div class="alert alert-info" role="alert">The<code>plotSetYLabel</code> and <code>plotSetXLabel</code> functions automatically set the font, font size, and font color for both axes. There is no need to specify it again.</div>
<p>Now we can create our formatted graph:</p>
<pre class="hljs-container hljs-container-solo"><code class="lang-gauss">// Plot COVID cases per 100K by state. Pass in the 'plotControl'
// structure, 'myPlot', to use the settings we applied above.
plotXY(myPlot, covid_cases, "cases_avg_per_100k ~ date + by(state)");</code></pre>
<p><a href="https://www.aptech.com/wp-content/uploads/2021/12/covid-cases-first-round.jpg"><img src="https://www.aptech.com/wp-content/uploads/2021/12/covid-cases-first-round.jpg" alt="" width="753" height="566" class="alignnone size-full wp-image-11582243" /></a></p>
<h2 id="highlighting-events">Highlighting Events</h2>
<p>It's common with time series plots that we want to note specific dates or periods on the graph. GAUSS includes four functions, introduced in GAUSS 22, that make highlighting events easy.</p>
<table style="width: 100%">
  <colgroup>
       <col span="1" style="width: 20%;">
       <col span="1" style="width: 40%;">
       <col span="1" style="width: 40%;">
    </colgroup>
<tr>
<th><b>Procedure</b></th><th>Description</th><th>Example</th>
</tr>
<tr>
<td><a href="https://docs.aptech.com/gauss/plotaddvline.html"><code>plotAddVLine</code></a></td><td>Adds one or more vertical lines to an existing plot.</td><td><code>plotAddVLine("2020-01-01");</code></td>
</tr>
<tr>
<td><a href="https://docs.aptech.com/gauss/plotaddvbar.html"><code>plotAddVBar</code></a></td><td>Adds one or more vertical bars spanning the full extent of the y-axis to an existing graph.</td><td><code>plotAddVBar("2020-01", "2020-03");</code></td>
</tr>
<tr>
<td><a href="https://docs.aptech.com/gauss/plotaddhline.html"><code>plotAddHLine</code></a></td><td>Adds one or more horizontal lines to an existing plot.</td><td><code>plotAddHLine(500);</code></td>
</tr>
<tr>
<td><a href="https://docs.aptech.com/gauss/plotaddhbar.html"><code>plotAddHBar</code></a></td><td>Adds one or more horizontal bars spanning the full extent of the x-axis to an existing graph.</td><td><code>plotAddHBar(580, 740);</code></td>
</tr>
</table>
<p>As an example, let's add vertical lines to help compare July 4th, 2020 to July 4th, 2021. </p>
<h3 id="specifying-legend-behavior-when-adding-lines">Specifying Legend Behavior When Adding Lines</h3>
<p>First, when adding new data to an existing plot, we need to specify how we want this data treated on the legend using the <a href="https://docs.aptech.com/gauss/plotsetlegend.html"><code>plotSetLegend</code></a> procedure. </p>
<p>We can add a label for the line to the legend:</p>
<pre class="hljs-container hljs-container-solo"><code class="lang-gauss">// Label next added line "Independence Day"
// and add to the legend
plotSetLegend(&amp;myPlot, "Independence Day");</code></pre>
<p>or we can tell GAUSS to not make any changes to the current legend:</p>
<pre class="hljs-container hljs-container-solo"><code class="lang-gauss">// The empty string specifies that the legend 
// should remain unchanged when the next line is added.
plotSetLegend(&amp;myPlot, "");</code></pre>
<h3 id="specifying-line-style">Specifying Line Style</h3>
<p>Next, we will specify the style of our lines using the <a href="https://docs.aptech.com/gauss/plotsetlinepen.html"><code>plotSetLinePen</code></a> procedure. This procedure lets us set the <em>width</em>, <em>color</em>, and <em>style</em> of the lines added to the graph. </p>
<table style="width: 100%">
  <colgroup>
       <col span="1" style="width: 20%;">
       <col span="1" style="width: 80%;">
    </colgroup>
<tr>
<th><b>Attribute</b></th><th>Description</th>
</tr>
<tr>
<td>width</td><td>Specifies the thickness of the line(s) in pixels. The default value is 2.</td>
</tr>
<tr>
<td>color</td><td>Optional argument, specifying the name or RGB value of the new color(s) for the line(s).</td>
</tr>
<tr>
<td>style</td><td>Optional argument, the style(s) of the pen for the line(s). <br>Options include: <table><tr><td>1</td><td>Solid line</td></tr><tr><td>2</td><td>Dash line</td></tr><tr><td>3</td><td>Dot line</td></tr><tr><td>4</td><td>Dash-Dot line</td></tr><tr><td>5</td><td>Dash-Dot-Dot line</td></tr></table></td>
</tr>
</table>
<pre class="hljs-container hljs-container-solo"><code class="lang-gauss">// Set the line width to be 2 pxs
// the line color to be #555555
// and the line to be dashed
plotSetLinePen(&amp;myPlot, 2, "#555555", 2);</code></pre>
<h3 id="adding-lines-to-mark-events">Adding Lines to Mark Events</h3>
<p>Finally, let's add the lines marking Independence Day in 2020 and 2021. </p>
<p>We first specify the dates we want to add lines using <a href="https://docs.aptech.com/gauss/asdate.html"><code>asDate</code></a>:</p>
<pre class="hljs-container hljs-container-solo"><code class="lang-gauss">// Create string array of independence days
ind_days = asDate("2020-07-04"$|"2021-07-04");</code></pre>
<p>Then we add our holidays to the existing graph using <code>plotAddVLine</code>:</p>
<pre class="hljs-container hljs-container-solo"><code class="lang-gauss">// Add holidays to graph
plotAddVLine(myPlot, ind_days);</code></pre>
<p><a href="https://www.aptech.com/wp-content/uploads/2021/12/covid-cases-event-lines-revised.jpg"><img src="https://www.aptech.com/wp-content/uploads/2021/12/covid-cases-event-lines-revised.jpg" alt="" width="753" height="566" class="alignnone size-full wp-image-11582294" /></a></p>
<p>The complete code for adding the lines looks like this:</p>
<pre class="hljs-container hljs-container-solo"><code class="lang-gauss">// Do not add vertical lines to the legend
plotSetLegend(&amp;myPlot, "");

// Set the line width to be 2 pixels
// the line color to be a dark grey color, #555555,
// and the line to be dashed
plotSetLinePen(&amp;myPlot, 2, "#555555", 2);

// Create string array of independence days
ind_days = asDate("2020-07-04"$|"2021-07-04");

// Add holidays to graph
plotAddVline(myPlot, ind_days);</code></pre>
<h3 id="adding-bars-to-mark-events">Adding Bars to Mark Events</h3>
<p>Now, let's add a vertical bar to mark the winter holidays time period of 2020. We will add a bar that marks the time span from Thanksgiving 2020 to New Year's Day 2021. </p>
<p>We first need to create a new <code>plotControl</code> structure to format our bars. Since we are adding a bar to the graph, we will fill our new <code>plotControl</code> structure with the defaults for a bar graph:</p>
<pre class="hljs-container hljs-container-solo"><code class="lang-gauss">// Create plotControl structure
struct plotControl plt;

// Fill with default bar settings
plt = plotGetDefaults("bar");</code></pre>
<p>Next, we can format our bar using the <a href="https://docs.aptech.com/gauss/plotsetfill.html"><code>plotSetFill</code></a> procedure. The <code>plotSetFill</code> procedure allows us to control the <em>fill style</em>, <em>opacity</em>, and <em>color</em> of graphed bars:</p>
<pre class="hljs-container hljs-container-solo"><code class="lang-gauss">// Set bar to have solid fill with 20% opacity
// and grey color
plotSetFill(&amp;plt, 1, 0.20, "grey");</code></pre>
<p>We also have to specify the legend behavior when the bar is added. This time let's add a label to the legend for the &quot;Winter Holidays&quot;:</p>
<pre class="hljs-container hljs-container-solo"><code class="lang-gauss">// Add "Winter Holidays" to the legend
plotSetLegend(&amp;plt, "Winter&lt;br&gt;Holidays");</code></pre>
<div class="alert alert-info" role="alert">The code <code>&lt;br&gt;</code> is HTML and it tells GAUSS to line break between the words <code>"Winter"</code> and <code>"Holidays"</code>. </div>
<p>Now we are ready to add the bar to our graph using the <a href="https://docs.aptech.com/gauss/plotaddvbar.html"><code>plotAddVBar</code></a> procedure:</p>
<pre class="hljs-container hljs-container-solo"><code class="lang-gauss">// Add a vertical bar to graph starting 
// on November 26th, 2020 and 
// ending January 1st, 2021
plotAddVBar(plt, asDate("2020-11-26"), asDate("2021-01"));</code></pre>
<p><a href="https://www.aptech.com/wp-content/uploads/2021/12/covid-cases-add-bar.jpg"><img src="https://www.aptech.com/wp-content/uploads/2021/12/covid-cases-add-bar.jpg" alt="" width="753" height="566" class="alignnone size-full wp-image-11582296" /></a></p>
<h2 id="adding-notes-to-graphs">Adding Notes to Graphs</h2>
<p>As final customization, let's add a note to our graph to label one of our holidays. We can do this using the <a href="https://docs.aptech.com/gauss/plotaddtextbox.html"><code>plotAddTextBox</code></a> procedure. </p>
<p>The <code>plotAddTextBox</code>takes three required inputs:</p>
<ul>
<li>The text to be added to the graph. </li>
<li>The x location where the text should start.</li>
<li>The y location where the text should start. </li>
</ul>
<div class="alert alert-info" role="alert">An optional <code>plotAnnotation</code> structure can be used to format the textbox and its text content. </div>
<pre class="hljs-container hljs-container-solo"><code class="lang-gauss">// Label the 2020 Independence Day line
plotAddTextBox("&amp;larr; Independence Day", asDate("2020-07-04"), 80);</code></pre>
<p><a href="https://www.aptech.com/wp-content/uploads/2021/12/covid-cases-final-rev2.jpg"><img src="https://www.aptech.com/wp-content/uploads/2021/12/covid-cases-final-rev2.jpg" alt="" width="753" height="566" class="alignnone size-full wp-image-11582298" /></a></p>
<div class="alert alert-info" role="alert">The code <code>&amp;larr;</code> is HTML and it tells GAUSS to add a left arrow to the graph. </div>
<h2 id="conclusion">Conclusion</h2>
<p>In this blog, we see how a few customizations and enhancements can make plots easier to read and more impactful. </p>
<p>In particular, we covered:</p>
<ul>
<li>Using grid lines without cluttering a graph. </li>
<li>Changing tick labels for readability. </li>
<li>Using clear axis labels. </li>
<li>Marking events and outcomes with lines, bars, and annotations. </li>
</ul>
<h3 id="further-reading">Further Reading</h3>
<ul>
<li><a href="https://www.aptech.com/blog/how-to-create-tiled-graphs-in-gauss/">How to Create Tiled Graphs in GAUSS</a></li>
<li><a href="https://www.aptech.com/blog/how-to-interactively-create-reusable-graphics-profiles/">How to Interactively Create Reusable Graphics Profiles</a></li>
<li><a href="https://www.aptech.com/blog/five-hacks-for-creating-custom-gauss-graphics/">Five Hacks For Creating Custom GAUSS Graphics</a></li>
<li><a href="https://www.aptech.com/blog/how-to-mix-match-and-style-different-graph-types/">How to Mix, Match, and Style Different Graph Types</a></li>
</ul>
<h2 id="references">References</h2>
<p>&quot;The New York Times. (2021). Coronavirus (Covid-19) Data in the United States. Retrieved 12-05-2021, from <a href="https://github.com/nytimes/covid-19-data">https://github.com/nytimes/covid-19-data</a>.&quot;</p>
]]></content:encoded>
					
					<wfw:commentRss>https://www.aptech.com/blog/visualizing-covid-19-panel-data-with-gauss-22/feed/</wfw:commentRss>
			<slash:comments>2</slash:comments>
		
		
			</item>
		<item>
		<title>Panel Data Stationarity Test With Structural Breaks</title>
		<link>https://www.aptech.com/blog/panel-data-stationarity-test-with-structural-breaks/</link>
					<comments>https://www.aptech.com/blog/panel-data-stationarity-test-with-structural-breaks/#comments</comments>
		
		<dc:creator><![CDATA[Eric]]></dc:creator>
		<pubDate>Fri, 02 Oct 2020 05:24:31 +0000</pubDate>
				<category><![CDATA[Econometrics]]></category>
		<category><![CDATA[Panel data]]></category>
		<category><![CDATA[structural breaks]]></category>
		<category><![CDATA[unit root]]></category>
		<guid isPermaLink="false">https://www.aptech.com/?p=21878</guid>

					<description><![CDATA[Reliable unit root testing is an important step of any time series analysis or panel data analysis. 

However, standard time series unit root tests and panel data unit root tests aren’t reliable when structural breaks are present. Because of this, when structural breaks are suspected, we must employ unit root tests that properly incorporate these breaks. 

Today we will examine one of those tests, the Carrion-i-Silvestre, et al. (2005) panel data test for stationarity in the presence of multiple structural breaks.]]></description>
										<content:encoded><![CDATA[<p>    <!-- MathJax configuration -->
    <style>
        .mjx-svg-href {
            fill: "inherit" !important;
            stroke: "inherit" !important;
        }
    </style>
    <script type="text/x-mathjax-config">
        MathJax.Hub.Config({ TeX: { equationNumbers: {autoNumber: "AMS"} } });
    </script>
    <script type="text/javascript">
window.MathJax = {
  tex2jax: {
    inlineMath: [ ['$','$'] ],
    displayMath: [ ['$$','$$'] ],
    processEscapes: true,
    processEnvironments: true
  },
  // Center justify equations in code and markdown cells. Elsewhere
  // we use CSS to left justify single line equations in code cells.
  displayAlign: 'center',
  "HTML-CSS": {
    styles: {'.MathJax_Display': {"margin": 0}},
    linebreaks: { automatic: false }
  },
  "SVG": {
    styles: {'.MathJax_SVG_Display': {"margin": 0}},
    linebreaks: { automatic: false }
  },
  showProcessingMessages: false,
  messageStyle: "none",
  menuSettings: { zoom: "Click" },
  AuthorInit: function() {
    MathJax.Hub.Register.StartupHook("End", function() {
            var timeout = false, // holder for timeout id
            delay = 250; // delay after event is "complete" to run callback
            var shrinkMath = function() {
              //var dispFormulas = document.getElementsByClassName("formula");
              var dispFormulas = document.getElementsByClassName("MathJax_SVG_Display");
              if (dispFormulas){
                // caculate relative size of indentation
                var contentTest = document.getElementsByTagName("body")[0];
                var nodesWidth = contentTest.offsetWidth;
                // if you have indentation
                var mathIndent = MathJax.Hub.config.displayIndent; //assuming px's
                var mathIndentValue = mathIndent.substring(0,mathIndent.length - 2);
                for (var i=0; i<dispFormulas.length; i++){
                  var dispFormula = dispFormulas[i];
                  var wrapper = dispFormula;
                  //var wrapper = dispFormula.getElementsByClassName("MathJax_Preview")[0].nextSibling;
                  var child = wrapper.firstChild;
                  wrapper.style.transformOrigin = "center"; //or top-left if you left-align your equations
                  var oldScale = child.style.transform;
                  //var newValue = Math.min(0.80*dispFormula.offsetWidth / child.offsetWidth,1.0).toFixed(2);
                  var newValue = Math.min(dispFormula.offsetWidth / child.offsetWidth,1.0).toFixed(2);
                  var newScale = "scale(" + newValue + ")";
                  if(newValue != "NaN" && !(newScale === oldScale)){
                    wrapper.style.transform = newScale;
                    wrapper.style["margin-left"]= Math.pow(newValue,4)*mathIndentValue + "px";
                    var wrapperStyle = window.getComputedStyle(wrapper);
                    var wrapperHeight = parseFloat(wrapperStyle.height);
                    wrapper.style.height = "" + (wrapperHeight * newValue) + "px";
                    if(newValue === "1.00"){
                      wrapper.style.cursor = "";
                      wrapper.style.height = "";
                    }
                    else {
                      wrapper.style.cursor = "zoom-in";
                    }
                  }

                }
            }
            };
            shrinkMath();
            window.addEventListener('resize', function() {
              clearTimeout(timeout);
              timeout = setTimeout(shrinkMath, delay);
            });
          });
  }
}
</script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.7/MathJax.js?config=TeX-AMS_SVG"></script></p>
<h3 id="introduction">Introduction</h3>
<p>The validity of many <a href="https://www.aptech.com/blog/introduction-to-the-fundamentals-of-time-series-data-and-analysis/" target="_blank" rel="noopener">time series models</a> and <a href="https://www.aptech.com/blog/introduction-to-the-fundamentals-of-panel-data/" target="_blank" rel="noopener">panel data models</a> requires that the underlying data is stationary. As such, reliable <a href="https://www.aptech.com/why-gauss-for-unit-root-testing/" target="_blank" rel="noopener">unit root testing</a> is an important step of any time series analysis or panel data analysis. </p>
<p>However, standard time series unit root tests and panel data unit root tests aren’t reliable when <a href="https://www.aptech.com/structural-breaks/" target="_blank" rel="noopener">structural breaks</a> are present. Because of this, when structural breaks are suspected, we must employ unit root tests that properly incorporate these breaks. </p>
<p>Today we will examine one of those tests, the Carrion-i-Silvestre, et al. (2005) panel data test for stationarity in the presence of multiple structural breaks.</p>
<h2 id="why-panel-data-unit-root-testing">Why Panel Data Unit Root Testing?</h2>
<p>We may be tempted when working with panel data to treat the data as individual time-series, performing unit root testing on each one separately. However, one of the fundamental ideas of panel data is that there is a shared underlying component that connects the group. </p>
<p>It is this shared component, that suggests that there are advantages to be gained from testing the panel data collectively:</p>
<ul>
<li>Panel data contains more combined information and variation than pure time-series data or cross-sectional data.   </li>
<li>Collectively testing for unit roots in panels provides more power than testing individual series.  </li>
<li>Panel data unit root tests are more likely than time series unit root tests to have standard asymptotic distributions. </li>
</ul>
<p>Put simply, when dealing with panel data, using tests designed specifically for panel data and testing the panel collectively, can lead to more reliable results.</p>
<div class="alert alert-info" role="alert">For more background on unit root testing, see our previous blog post, <a href="https://www.aptech.com/blog/how-to-conduct-unit-root-tests-in-gauss/" target="_blank" rel="noopener">“How to Conduct Unit Root Tests in GAUSS”</a>.</div>
<h2 id="why-do-we-need-to-worry-about-structural-breaks">Why do we Need to Worry About Structural Breaks?</h2>
<p>It is important to properly address structural breaks when conducting unit root testing because most <strong>standard unit root tests will bias towards non-rejection</strong> of the unit root test. We discuss this in greater detail in our <a href="https://www.aptech.com/blog/unit-root-tests-with-structural-breaks/" target="_blank" rel="noopener">“Unit Root Tests with Structural Breaks”</a> blog.</p>
<h2 id="panel-data-stationarity-test-with-structural-breaks">Panel Data Stationarity Test with Structural Breaks</h2>
<p>The Carrion-i-Silvestre, <em>et al.</em> (2005) panel data stationarity test introduces a number of important testing features:</p>
<ul>
<li>Tests the null hypothesis of stationarity against the alternative of non-stationarity.  </li>
<li>Allows for multiple, unknown structural breaks.  </li>
<li>Accommodates shifts in the mean and/or trend of the individual time series.   </li>
<li>Does not require the same breaks across the entire panel but, rather, allows for each individual to have a different number of breaks at different dates.   </li>
<li>Allows for homogeneous or heterogeneous long-run variances across individuals.  </li>
</ul>
<div style="text-align:center;background-color:#37444d;padding-top:40px;padding-bottom:40px;"><span style="color:#FFFFFF">Deciding which unit root test is right for your data?</span> <a href="https://www.aptech.com/why-gauss-for-unit-root-testing/#ur_test_guide">Download our Unit Root Selection Guide!</a></div>
<h2 id="conducting-panel-data-stationarity-tests-in-gauss">Conducting Panel Data Stationarity Tests in GAUSS</h2>
<h3 id="where-can-i-find-the-tests">Where can I Find the Tests?</h3>
<p>The panel data stationarity test with structural breaks is implemented by the <a href="https://docs.aptech.com/gauss/tspdlib/docs/pd_kpss.html" target="_blank" rel="noopener"><code>pd_kpss</code></a> procedure in the GAUSS <a href="https://docs.aptech.com/gauss/tspdlib/docs/tspdlib-landing.html" target="_blank" rel="noopener">tspdlib</a> library. </p>
<p>The library can be directly installed using the <a href="https://www.aptech.com/blog/gauss-package-manager-basics/" target="_blank" rel="noopener">GAUSS Package Manager</a>. </p>
<h3 id="what-format-should-my-data-be-in">What Format Should my Data be in?</h3>
<p>The <code>pd_kpss</code> procedure takes panel data in wide format - this means that each column of your data matrix should contain the time series observations for a different individual in the panel. </p>
<p>For example, if we have 100 observations of real GDP for 3 countries, our test data will be 100 x 3 matrix.</p>
<table>
<thead>
<tr>
<th>Observation #</th>
<th>Country A</th>
<th>Country B</th>
<th>Country C</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>1.11</td>
<td>1.40</td>
<td>1.39</td>
</tr>
<tr>
<td>2</td>
<td>1.14</td>
<td>1.37</td>
<td>1.34</td>
</tr>
<tr>
<td>3</td>
<td>1.27</td>
<td>1.45</td>
<td>1.28</td>
</tr>
<tr>
<td>4</td>
<td>1.19</td>
<td>1.51</td>
<td>1.35</td>
</tr>
<tr>
<td>$\vdots$</td>
<td>$\vdots$</td>
<td>$\vdots$</td>
<td>$\vdots$</td>
</tr>
<tr>
<td>99</td>
<td>1.53</td>
<td>1.75</td>
<td>1.65</td>
</tr>
<tr>
<td>100</td>
<td>1.68</td>
<td>1.78</td>
<td>1.67</td>
</tr>
</tbody>
</table>
<h3 id="how-do-i-call-the-test-procedure">How do I Call the Test Procedure?</h3>
<p>The first step to implementing the panel date stationarity test with structural breaks in GAUSS is to load the <code>tspdlib</code> library. </p>
<pre class="hljs-container hljs-container-solo"><code class="lang-gauss">library tspdlib;</code></pre>
<p>This statement provides access to all the procedures in the <code>tspdlib</code> libraries. After loading the libraries, the <code>pd_kpss</code> procedure can be called directly from the command line or within a program file. </p>
<p>The <code>pd_kpss</code> procedure takes 2 required inputs and 5 optional arguments:</p>
<pre class="hljs-container hljs-container-solo"><code class="lang-gauss">{ testd_hom, testd_het, m_lee_est, brks } = pd_kpss(y, model, 
                                                       nbreaks,
                                                       bwl,
                                                       varm, 
                                                       pmax, 
                                                       b_ctl);</code></pre>
<hr>
<dl>
<dt>y</dt>
<dd>$T \times N$ Wide form panel data to be tested.</dd>
<dt>model</dt>
<dd>Scalar, model to be used when there are structural breaks found:
<table>
<tbody>
<tr><td>1</td><td>Constant (Hadri test)</td></tr>
<tr><td>2</td><td>Constant + trend (Hadri test)</td></tr>
<tr><td>3</td><td>Constant + shift (in mean)</td></tr>
<tr><td>4</td><td>Constant + trend + shift (in mean and trend)</td></tr>
</tbody>
</table></dd>
<dt>nbreaks</dt>
<dd>Scalar, Optional input, number of breaks to consider (up to 5). Default = 5.</dd>
<dt>bwl</dt>
<dd>Scalar, Optional input, bandwidth for the spectral window. Default = round(4 * (T/100)^(2/9)).</dd>
<dt>varm</dt>
<dd>Scalar, Optional input, kernel used for long-run variance computation. Default = 1:
<table>
<tbody>
<tr><td>1</td><td>iid</td></tr>
<tr><td>2</td><td>Bartlett.</td></tr>
<tr><td>3</td><td>Quadratoc spectral (QS).</td></tr>
<tr><td>4</td><td>Sul, Phillips, and Choi (2003) with the Bartlett kernel.</td></tr>
<tr><td>5</td><td>Sul, Phillips, and Choi (2003) with quadratic spectral kernel.</td></tr>
<tr><td>6</td><td>Kurozumi with the Bartlett kernel.</td></tr>
<tr><td>7</td><td>Kurozumi with quadratic spectral kernel.</td></tr>
</tbody>
</table></dd>
<dt>pmax</dt>
<dd>Scalar, Optional input, denotes the number of maximum lags that is used in the estimation of the AR(p) model for lrvar. The final number of lags is chosen using the BIC criterion. Default = 8.</dd>
<dt>b_ctl</dt>
<dd>Optional input, An instance of the <code>breakControl</code> structure controlling the setting for the Bai and Perron structural break estimation.</dd>
</dl>
<hr>
<p>The <code>pd_kpss</code> procedure provides 4 returns :</p>
<hr>
<dl>
<dt>test_hom</dt>
<dd>Scalar, stationarity test statistic with structural breaks and homogeneous variance.</dd>
<dt>test_het</dt>
<dd>Scalar, stationarity test statistic with structural breaks and heterogeneous variance.</dd>
<dt>kpss_test</dt>
<dd>Matrix, individual tests. This matrix contains the test statistics in the first column, the number of breaks in the second column, the BIC chosen optimal lags, and the LWZ chosen optimal lags.</dd>
<dt>brks</dt>
<dd>Matrix of estimated breaks. Breaks for each individual group are contained in separate rows.
<hr></dd>
</dl>
<h2 id="empirical-example">Empirical Example</h2>
<p>Let’s look further into testing for panel data stationarity with structural breaks using an empirical example.</p>
<h3 id="data-description">Data Description</h3>
<p>The dataset contains government deficit as a percentage of GDP for nine OECD countries. The time span ranges from 1995 to 2019. This gives us a balanced panel of 9 individuals and 25 time observations each. </p>
<h3 id="loading-our-data-into-gauss">Loading our data into GAUSS</h3>
<p>Our first step is to load the data from <code>govt-deficit-oecd.csv</code> using <a href="https://docs.aptech.com/gauss/loadd.html" target="_blank" rel="noopener"><code>loadd</code></a>. This <code>.csv</code> file contains three variables, <code>Country</code>, <code>Year</code>, and <code>Gov_deficit</code>. </p>
<p>We will load all three variables into a <a href="https://www.aptech.com/blog/what-is-a-gauss-dataframe-and-why-should-you-care/" target="_blank" rel="noopener">GAUSS dataframe</a>. Note that <code>loadd</code> automatically detects that <code>Country</code> is a categorical variable, and assigns the <code>category</code> type. However, we will need to convert <code>Year</code> to a <a href="https://www.aptech.com/blog/dates-and-times-made-easy/" target="_blank" rel="noopener">date variable</a>:</p>
<pre class="hljs-container hljs-container-solo"><code class="lang-gauss">// Load all variables and convert country to numeric categories
data = loadd("govt-deficit-oecd.csv");

// Convert "Year" to a date variable
data = asDate(data, "%Y", "Year");</code></pre>
<p>This loads our data in long format (a 225x1 dataframe). Our next step, is to convert this to wide-format using the <a href="https://docs.aptech.com/gauss/dfwider.html" target="_blank" rel="noopener"><code>dfWider</code></a> procedure. </p>
<pre class="hljs-container hljs-container-solo"><code class="lang-gauss">// Specify names_from column 
names_from = "Country";

// Specify values_from column
values_from = 
// Convert from long to wide format
wide_data = df(data);</code></pre>
<pre class="hljs-container hljs-container-solo"><code class="lang-gauss">// Delete first column which contains the year variable
govt_def = delcols(wide_data, 1);</code></pre>
<h3 id="setting-up-our-model-parameters">Setting up our Model Parameters</h3>
<p>With our loading and transformations complete, we are ready to set-up our testing parameters. For this test, we will allow for both a constant and trend.
All other parameters will be kept at their default values. </p>
<pre class="hljs-container hljs-container-solo"><code class="lang-gauss">// Specify which model to 
// Allow for both constant and trend.
model = 2;</code></pre>
<h3 id="calling-the-pd_kpss-procedure">Calling the <code>pd_kpss</code> Procedure</h3>
<p>Finally, we call the <code>pd_kpss</code> procedure:</p>
<pre class="hljs-container hljs-container-solo"><code class="lang-gauss">{ test_hom, test_het, kpss_test, brks } = pd_kpss(wide_data, model);</code></pre>
<h2 id="empirical-results">Empirical Results</h2>
<p>The <code>pd_kpss</code> output includes:</p>
<ul>
<li>A header describing the testing settings. </li>
<li>The <code>test_hom</code> and <code>test_het</code> test statistics along with associated p-values.</li>
<li>The critical values for both test statistics.  </li>
<li>The testing conclusions based on a comparison of the test statistics to the associated critical values. </li>
</ul>
<pre>Test:                                                PD KPSS
Ho:                                             Stationarity
Number of breaks:                                       None
LR variance:                                             iid
Model:                                Break in level &amp; trend
==============================================================
                                      PD KPSS          P-val

Homogenous                             14.352          0.000
Heterogenous                           10.425          0.000

Critical Values:
                            1%             5%            10%

Homogenous               2.326          1.645          1.282
Heterogenous             2.326          1.645          1.282
==============================================================

Homogenous var:
Reject the null hypothesis of stationarity at the 1% level.

Heterogenous var:
Reject the null hypothesis of stationarity at the 1% level.</pre>
<p>These results tell us that we can reject the null hypothesis of stationarity at the 1% level for both cases, homogenous and heterogenous variance.</p>
<p>The test results also include a table of individual test results and conclusions:</p>
<pre>==============================================================
Individual panel results
==============================================================
                                         KPSS    Num. Breaks

AUT                                     0.165          2.000
DEU                                     0.079          0.000
ESP                                     0.249          4.000
FRA                                     0.210          2.000
GBR                                     0.298          2.000
IRL                                     0.235          2.000
ITA                                     0.130          3.000
LUX                                     0.127          3.000
NOR                                     0.414          1.000

Critical Values:
                            1%             5%            10%

AUT                      0.059          0.048          0.043
DEU                      0.207          0.150          0.122
ESP                      0.035          0.031          0.028
FRA                      0.056          0.045          0.040
GBR                      0.058          0.046          0.041
IRL                      0.074          0.059          0.051
ITA                      0.055          0.045          0.041
LUX                      0.058          0.045          0.039
NOR                      0.083          0.066          0.058
==============================================================

AUT                                     Reject Ho ( 1% level)
DEU                                          Cannot reject Ho
ESP                                     Reject Ho ( 1% level)
FRA                                     Reject Ho ( 1% level)
GBR                                     Reject Ho ( 1% level)
IRL                                     Reject Ho ( 1% level)
ITA                                     Reject Ho ( 1% level)
LUX                                     Reject Ho ( 1% level)
NOR                                     Reject Ho ( 1% level)
==============================================================</pre>
<p>Finally, the <code>pd_kpss</code> procedure prints the estimated breakpoints for each individual in the panel.</p>
<pre>Group        Break 1      Break 2      Break 3      Break 4      Break 5<br />
AUT          2003         2008         .            .            .<br />
DEU          .            .            .            .            .<br />
ESP          1999         2006         2009         2012         .<br />
FRA          2001         2008         .            .            .<br />
GBR          2000         2008         .            .            .<br />
IRL          2007         2010         .            .            .<br />
ITA          1997         2006         2009         .            .<br />
LUX          1999         2004         2008         .            .<br />
NOR          2008         .            .            .            .            </pre>
<div class="alert alert-info" role="alert">For more information on how to view the matrices returned by <code>pd_kpss</code> see our <a href="https://www.aptech.com/resources/tutorials/introduction-to-gauss-viewing-data-in-gauss/" target="_blank" rel="noopener">data viewing tutorial</a>.</div>
<h2 id="interpreting-the-results">Interpreting the Results</h2>
<p>When interpreting the results from <code>pd_kpss</code> test, it helps to remember a few key things:</p>
<ul>
<li>The test considers the null hypothesis of <a href="https://www.aptech.com/blog/how-to-conduct-unit-root-tests-in-gauss/#what-is-a-stationary-time-series" target="_blank" rel="noopener">stationarity</a> against the alternative of non-stationarity.</li>
<li>We reject the null hypothesis of stationarity at 
<ul>
<li>Large values of the test statistic. </li>
<li>Small p-values. </li>
</ul></li>
</ul>
<p>Notice that the TSPDLIB library conveniently provides interpretations for the <code>pd_kpss</code> tests. </p>
<h3 id="panel-data-test-statistic">Panel Data Test Statistic</h3>
<p>The test statistic for our panel, assuming homogeneous variances:</p>
<ul>
<li>Is equal to 14.352 with a p-value of 0.0000.</li>
<li>Suggests that we reject the null hypothesis of stationarity at the 1% level. </li>
</ul>
<p>The test statistic for our panel, assuming heterogeneous variances:</p>
<ul>
<li>Is equal to 10.425 with a p-value of 0.000. </li>
<li>Suggests that we reject the null hypothesis of stationarity at the 1% level.</li>
</ul>
<p>These results tell us that regardless of whether we assume heterogeneous or homogenous variances, we can reject the null hypothesis of stationarity for the panel. Given this, we must make proper adjustments to account for non-stationarity when modeling our data. </p>
<h3 id="individual-test-results">Individual Test Results</h3>
<p><a href="https://www.aptech.com/wp-content/uploads/2020/09/pankpss-graph-spanish-1.jpeg"><img src="https://www.aptech.com/wp-content/uploads/2020/09/pankpss-graph-spanish-1.jpeg" alt="Panel data stationarity test with structural breaks. " width="75%" height="75%" class="aligncenter size-full wp-image-11580083" /></a></p>
<table>
<th>Country</th><th>Statistic</th><th>Breaks</th><th>Conclusion</th>
<tbody>
<tr><td>Austria</td><td>0.165</td><td>2003;2008</td><td>Reject null at 1%.</td></tr> 
<tr><td>France</td><td>0.210</td><td>2001;2008</td><td>Reject null at 1%.</td></tr>
<tr><td>Germany</td><td>0.079</td><td>None</td><td>Cannot reject null.</td></tr>
<tr><td>Ireland</td><td>0.235</td><td>2007;2010</td><td>Reject null at 1%.</td></tr>
<tr><td>Italy</td><td>0.130</td><td>1997;2006;2009</td><td>Reject null at 1%.</td></tr>
<tr><td>Luxemberg</td><td>0.127</td><td>1999;2004;2008</td><td>Reject null at 1%.</td></tr>
<tr><td>Norway</td><td>0.414</td><td>2008</td><td>Reject null at 1%.</td></tr>
<tr><td>Spain</td><td>0.249</td><td>1999;2006;2009;2012</td><td>Reject null at 1%.</td></tr>
<tr><td>United Kingdom</td><td>0.298</td><td>2000;2008</td><td>Reject null at 1%.</td></tr>

</tbody>
</table>
<h2 id="conclusion">Conclusion</h2>
<p>Todays's blog considers the panel data stationarity test proposed by Carrion-i-Silvestre, et al. (2005). This test is built upon two crucial aspects of unit root testing:</p>
<ul>
<li>Panel data specific tests should be used with panel data.</li>
<li>Structural breaks should be accounted for.</li>
</ul>
<p>Ignoring these two facts can result in unreliable results. </p>
<p>After today, you should have a stronger understanding of how to implement the panel data stationarity test with structural breaks in GAUSS and how to interpret the results. </p>
<h3 id="further-reading">Further Reading</h3>
<ol>
<li><a href="https://www.aptech.com/blog/panel-data-structural-breaks-and-unit-root-testing/" target="_blank" rel="noopener">Panel data, structural breaks and unit root testing</a></li>
<li><a href="https://www.aptech.com/blog/panel-data-basics-one-way-individual-effects/" target="_blank" rel="noopener">Panel Data Basics: One-way Individual Effects</a></li>
<li><a href="https://www.aptech.com/blog/how-to-aggregate-panel-data-in-gauss/" target="_blank" rel="noopener">How to Aggregate Panel Data in GAUSS</a></li>
<li><a href="https://www.aptech.com/blog/introduction-to-the-fundamentals-of-panel-data/" target="_blank" rel="noopener">Introduction to the Fundamentals of Panel Data</a></li>
<li><a href="https://www.aptech.com/blog/transforming-panel-data-to-long-form-in-gauss/" target="_blank" rel="noopener">Transforming Panel Data to Long Form in GAUSS</a></li>
<li><a href="https://www.aptech.com/blog/get-started-with-panel-data-in-gauss-video/" target="_blank" rel="noopener">Getting Started With Panel Data in GAUSS </a></li>
</ol>
]]></content:encoded>
					
					<wfw:commentRss>https://www.aptech.com/blog/panel-data-stationarity-test-with-structural-breaks/feed/</wfw:commentRss>
			<slash:comments>1</slash:comments>
		
		
			</item>
		<item>
		<title>Introduction to the Fundamentals of Panel Data</title>
		<link>https://www.aptech.com/blog/introduction-to-the-fundamentals-of-panel-data/</link>
					<comments>https://www.aptech.com/blog/introduction-to-the-fundamentals-of-panel-data/#comments</comments>
		
		<dc:creator><![CDATA[Eric]]></dc:creator>
		<pubDate>Fri, 29 Nov 2019 18:52:07 +0000</pubDate>
				<category><![CDATA[Econometrics]]></category>
		<category><![CDATA[Panel data]]></category>
		<guid isPermaLink="false">https://www.aptech.com/?p=20968</guid>

					<description><![CDATA[Panel data, sometimes referred to as longitudinal data, is data that contains observations about different cross sections across time. Panel data exhibits characteristics of both cross-sectional data and time-series data. This blend of characteristics has given rise to a unique branch of time series modeling made up of methodologies specific to panel data structure. This blog offers a complete guide to those methodologies including the nature of panel data series, types of panel data, and panel data models.]]></description>
										<content:encoded><![CDATA[<h2 id="introduction">Introduction</h2>
<p>Panel data, sometimes referred to as longitudinal data, is data that contains observations about different cross sections across time. Examples of groups that may make up panel data series include countries, firms, individuals, or demographic groups. </p>
<p>Like time series data, panel data contains observations collected at a regular frequency, chronologically. Like cross-sectional data, panel data contains observations across a collection of individuals.</p>
<p>There are a number of advantages of panel data:</p>
<ul>
<li>Panel data can model both the common and individual behaviors of groups.</li>
<li>Panel data contains more information, more variability, and more efficiency than pure <a href="https://www.aptech.com/blog/introduction-to-the-fundamentals-of-time-series-data-and-analysis/" target="_blank" rel="noopener">time series data</a> or cross-sectional data.</li>
<li>Panel data can detect and measure statistical effects that pure time series or cross-sectional data can't. </li>
<li>Panel data can minimize estimation biases that may arise from aggregating groups into a single time series. </li>
</ul>
<p>Panel data examples can be found in <a href="https://www.aptech.com/industry-solutions/econometrics/" target="_blank" rel="noopener">economics</a> , <a href="https://www.aptech.com/industry-solutions/social-science/" target="_blank" rel="noopener">social sciences</a>, <a href="https://www.aptech.com/industry-solutions/epidemiology/" target="_blank" rel="noopener">medicine and epidemiology</a>, <a href="https://www.aptech.com/industry-solutions/finance/" target="_blank" rel="noopener">finance</a>, and the <a href="https://www.aptech.com/industry-solutions/engineeringphysics/" target="_blank" rel="noopener">physical sciences</a>.</p>
<table>
 <thead>
 <tr>
      <th colspan="3">
         <h3 id="what-is-an-example-of-panel-data"><br>What Is an Example of Panel Data? </h3>
      </th>
   </tr>

<tr><th>Field</th><th>Example topics</th><th>Example dataset</th></tr>
</thead>
<tbody>
<tr><td>Microeconomics</td><td>
GDP across multiple countries, Unemployment across different states, Income dynamic studies, international current account balances. 
</td><td>
Panel Study of Income Dynamics (PSID)
</td></tr>
<tr><td>Macroeconomics</td><td>
International trade tables, world socioeconomic tables, currency exchange rate tables. 
</td><td><a href="https://www.rug.nl/ggdc/productivity/pwt/" target="_blank" rel="noopener">Penn World Tables</a></td></tr>
<tr><td>Epidemiology and Health Statistics</td><td>
Public health insurance data, disease survival rate data, child development and well-being data. 
</td><td><a href="https://meps.ahrq.gov/mepsweb/" target="_blank" rel="noopener">Medical Expenditure Panel Survey</a></td></tr>
<tr><td>Finance</td><td>
Stock prices by firm, market volatilities by country or firm.
</td><td>
<a href="https://finance.yahoo.com/world-indices/" target="_blank" rel="noopener">Global Market Indices</a>
</td></tr>
</tbody>
</table>
<h2 id="what-is-panel-data">What Is Panel Data?</h2>
<p>Panel data is a collection of quantities obtained across multiple individuals, that are assembled over even intervals in time and ordered chronologically. Examples of individual groups include individual people, countries, and companies.</p>
<p><a href="https://www.aptech.com/wp-content/uploads/2019/11/two-groups-from-a-panel.jpg"><img src="https://www.aptech.com/wp-content/uploads/2019/11/two-groups-from-a-panel.jpg" alt="Two groups from a panel dataset." width="600" height="300" class="aligncenter size-full wp-image-21225" /></a></p>
<p>In order to denote both individuals and time observations, panel data often refers to groups with the subscript <em>i</em> and time as the subscript <em>t</em>. For example, a panel data observation $Y_{it}$ is observed for all individuals $i = {1, ..., N}$ across all time periods  $t = {1, ..., T}$ </p>
<p>More specifically:</p>
<table>
<thead>
<tr><th>Group</th><th>Time Period</th><th>Notation</th></tr></thead>
<tbody>
<tr><td>1</td><td>1</td><td>$Y_{11}$</td></tr>
<tr><td>1</td><td>2</td><td>$Y_{12}$</td></tr>
<tr><td>1</td><td>T</td><td>$Y_{1T}$</td></tr>
<tr><td>⁞</td><td>⁞</td><td>⁞</td></tr>
<tr><td>N</td><td>1</td><td>$Y_{N1}$</td></tr>
<tr><td>N</td><td>2</td><td>$Y_{N2}$</td></tr>
<tr><td>N</td><td>T</td><td>$Y_{NT}$</td></tr>
</tbody>
</table>
<h3 id="wide-and-long-panel-datasets">Wide and Long Panel Datasets</h3>
<p>Panel datasets may come in different formats. The format in the table above is sometimes called long format data. Long format datasets stack the observations of each variable from all groups, across at all time periods into one column. </p>
<p>When panel data is stored with the observations for a single variable from separate groups stored in separate columns this is sometimes referred to as wide data format.</p>
<table>
<thead>
<tr><th>Time</th><th>$Y_1$</th><th>$Y_2$</th><th>$Y_N$</th></tr></thead>
<tbody>
<tr><td>1</td><td>$Y_{11}$</td><td>$Y_{21}$</td><td>$Y_{N1}$</td></tr>
<tr><td>2</td><td>$Y_{12}$</td><td>$Y_{22}$</td><td>$Y_{N2}$</td></tr>
<tr><td>3</td><td>$Y_{13}$</td><td>$Y_{23}$</td><td>$Y_{N3}$</td></tr>
<tr><td>4</td><td>$Y_{14}$</td><td>$Y_{24}$</td><td>$Y_{N4}$</td></tr>
<tr><td>⁞</td><td>⁞</td><td>⁞</td><td>⁞</td></tr>
<tr><td>T-1</td><td>$Y_{1T-1}$</td><td>$Y_{2T-1}$</td><td>$Y_{NT-1}$</td></tr>
<tr><td>T</td><td>$Y_{1T}$</td><td>$Y_{2T}$</td><td>$Y_{NT}$</td></tr>
</tbody>
</table>
<h3 id="balanced-panel-data-versus-unbalanced-panel-data">Balanced Panel Data Versus Unbalanced Panel Data</h3>
<p>Panel data can also be characterized as unbalanced panel data or balanced panel data:</p>
<ul>
<li>Balanced panel datasets have the same number of observations for all groups.  </li>
<li>Unbalanced panel datasets have missing values at some time observations for some of the groups.  </li>
</ul>
<p>Certain panel data models are only valid for balanced datasets. If the panel datasets are unbalanced they may need to be condensed to include only the consecutive periods for which there are observations for all individuals in the cross section.</p>
<h2 id="panel-data-and-heterogeneity">Panel Data and Heterogeneity</h2>
<p>Panel data series modeling centers around addressing the likely dependence across data observations within the same group. In fact, the primary difference between panel data models and time series models, is that panel data models allow for heterogeneity across groups and introduce individual-specific effects.    </p>
<p>As an example, consider a panel data series which includes gross domestic product (GDP) data for a panel of 5 different countries, the United States, France, Canada, Greece, and Australia:</p>
<ul>
<li>A worldwide economic recession is likely to impact all 5 countries and causes changes in the GDP across all 5 countries.</li>
<li>An election in Australia is likely to impact the GDP of Australia but may not affect the other countries in the panel.</li>
<li>A change in North American trade policy may only regionally impact the US and Canada.</li>
<li>A change in the Euro exchange rate will most directly affect only France and Greece.</li>
</ul>
<p>Panel data models include techniques that can address these heterogeneities across individuals. Furthermore, pure cross-sectional methods and pure time series models may not be valid in the presence of this heterogeneity. </p>
<h2 id="modeling-panel-data">Modeling Panel Data</h2>
<p>Researchers commonly analyze datasets with multiple observations of a set of cross-sectional units (e.g., people, firms, countries) over time. For example, one may have data covering the production of multiple
firms or the gross product of multiple countries across a number of years. </p>
<p>Modeling these panel data series is a unique branch of time series modeling made up of methodologies specific to their structure. </p>
<p>This section looks more closely at panel data analysis and the associated panel data models. </p>
<h3 id="homogeneous-versus-heterogeneous-panel-data-models">Homogeneous Versus Heterogeneous Panel Data Models</h3>
<p>Panel data methods can be split into two broad categories:</p>
<ul>
<li>Homogeneous (or pooled) panel data models assume that the model parameters are common across individuals.</li>
<li>Heterogeneous models allow for any or all of the model parameters to vary across individuals. Fixed effects and random effects models are both examples of heterogeneous panel data models.</li>
</ul>
<p>Within these groups, the assumptions made about the variation of the model across individuals are the primary drivers for which model to use. </p>
<p>Let’s consider a simple linear model</p>
<p>$$y_{it} = \alpha + \beta x_{it} + \epsilon_{it}$$</p>
<p>The representation above is a homogenous model: </p>
<ul>
<li>The constant, $ \alpha $, is the same across groups and time.</li>
<li>The coefficient, $ \beta $, is constant across groups and time. </li>
<li>Any differences across groups enter the model only through the error term, $ \epsilon_{it} $.</li>
</ul>
<p>Alternatively, we could believe that groups share common coefficients on regressors but there are group-specific intercepts, as is captured in the fixed effects or <a href="https://www.aptech.com/blog/panel-data-basics-one-way-individual-effects/" target="_blank" rel="noopener">least squares dummy variable LSDV</a> model</p>
<p>$$y_{it} = \alpha_i + \beta x_{it} + \epsilon_{it}$$</p>
<p>The representation above is a heterogenous model, because the constants, $ \alpha_i $, are group-specific.</p>
<h3 id="individual-specific-effects-in-panel-data">Individual-Specific Effects in Panel Data</h3>
<p>This section considers four popular panel data models:</p>
<ul>
<li>Pooled ordinary least squares.</li>
<li>One-way fixed effects.</li>
<li>One-way random effects.</li>
<li>Random coefficients. </li>
</ul>
<p>We will examine these models using an assumed data generation process given by </p>
<p>$$ y_{it} = \beta x_{it} + \delta z_i + \epsilon_{it}$$</p>
<p>In this model, $X$ represents the observed characteristics such as age, firm size, expenditures, and $Z$ represents unobserved characteristics, such as management quality, growth opportunities, or skill. </p>
<table>
<thead>
<tr><th>Component</th><th>Description</th><th>Example</th></tr></thead>
<tbody>
<tr><td>$x_{it}$</td><td>These are observable characteristics. These characteristics may be constant for an individual across all time, such as race, or may vary across all time observations for an individual such as age.</td><td>Age, race, company size, expenditure, population, GDP</td></tr>
<tr><td>$z_i$</td><td>Unobservable characteristics, responsible for model heterogeneity.</td><td>Skill, company potential, lack of basic infrastructure in the community, political unrest.</td></tr>
<tr><td>$\epsilon_{it}$</td><td>Stochastic error term.</td><td>N/A</td></tr>
</tbody>
</table>
<p><b>What Is Pooled Ordinary Least Squares?</b><br />
In some cases, there are no unobservable individual-specific effects, and $\delta z_i $ is constant across individuals. This is a strong assumption and implies that all the observations within groups are independent of one another. </p>
<p>In these cases, the model becomes</p>
<p>$$ y_{it} = \beta x_{it} + \alpha + \epsilon_{it}$$</p>
<p>This implies that when there is no dependence within individual groups, the panel data can be treated as one large, pooled dataset. The model parameters, $\beta$, and, $\alpha$, can be directly estimated using pooled ordinary least squares.</p>
<p>Linear independence within the groups of a panel is unlikely and pooled OLS is rarely acceptable for panel data models. </p>
<p><b>What Is The One-Way Fixed Effects Model?</b><br />
The <a href="https://www.aptech.com/blog/panel-data-basics-one-way-individual-effects/" target="_blank" rel="noopener">one-way fixed effects panel data model</a>:</p>
<ul>
<li>Includes unobservable time-specific or individual-specific effects. These effects capture omitted variables. </li>
<li>Assumes that individual-specific effects are correlated with the observed characteristics, $x_{it}$ </li>
<li>Pooled OLS estimates for data generated by this process will be inconsistent.</li>
</ul><div id="attachment_21244" style="width: 610px" class="wp-caption aligncenter"><a href="https://www.aptech.com/wp-content/uploads/2019/11/panel-blog-fixed-effects.jpg"><img aria-describedby="caption-attachment-21244" decoding="async" fetchpriority="high" src="https://www.aptech.com/wp-content/uploads/2019/11/panel-blog-fixed-effects.jpg" alt="Plot of fixed effects panel data." width="600" height="300" class="size-full wp-image-21244" /></a><p id="caption-attachment-21244" class="wp-caption-text">Fixed effects data with group-specific intercepts and one shared slope.</p></div> <p>As an example, let’s consider the one-way fixed effects model with individual-specific effects where the unobservable component, $\delta z_i$ , acts like an individual-specific intercept:</p>
<p>$$y_{it} = \beta x_{it} + \alpha_i + \epsilon_{it}$$</p>
<p>The intercept term, $\alpha_i$, varies across individuals but is constant across time. This term is composed of the constant intercept term, $\mu$, and the individual-specific error terms,  $\gamma_i$.</p>
<p>The key feature of the fixed effects model is that $\gamma_i$ has a true, but unobservable, effect that must be estimated. More importantly, if we estimate $\beta$ using pooled OLS and fail to appropriately account for $\gamma_i$, the estimates will be inconsistent and biased.</p>
<p>The fixed effects model requires the estimation of the model parameter $\beta$ and individual $\alpha_i$ for each of the N groups in the panel. This is generally achieved using one of three estimation techniques:</p>
<ul>
<li><a href="https://www.aptech.com/examples/tsmt/tscsfit-grunfeld/" target="_blank" rel="noopener">Within-group estimation</a>.</li>
<li>First differences estimation.</li>
<li><a href="https://www.aptech.com/examples/tsmt/lsdvfit-simulated/" target="_blank" rel="noopener">Least squares dummy variable (LSDV)</a> estimation.</li>
</ul>
<p>The first two of these techniques focuses on eliminating the individual effects before estimation. The LSDV method directly incorporates these effects using dummy variables.</p>
<p><b>What Is the One-Way Random Effects Model?</b><br />
The <a href="https://www.aptech.com/blog/panel-data-basics-one-way-individual-effects/" target="_blank" rel="noopener">one-way random effects panel data model</a>:</p>
<ul>
<li>Includes unobservable time-specific or individual-specific effects, $\delta z_i$, which act like individual-specific stochastic error terms.</li>
<li>Assumes that these effects are uncorrelated with the observed characteristics, $x_{it}$. </li>
<li>Does not result in biased OLS estimates of coefficients but does lead to inefficient parameters and incorrect standard inference tools.</li>
</ul><div id="attachment_21257" style="width: 610px" class="wp-caption aligncenter"><a href="https://www.aptech.com/wp-content/uploads/2019/11/panel-blog-random-variations.jpg"><img aria-describedby="caption-attachment-21257" decoding="async" src="https://www.aptech.com/wp-content/uploads/2019/11/panel-blog-random-variations.jpg" alt="Plot of random effects panel data showing stochastic differences across groups." width="600" height="300" class="size-full wp-image-21257" /></a><p id="caption-attachment-21257" class="wp-caption-text">Plot of random effects panel data showing stochastic differences across groups.</p></div><p>The distinguishing feature of the random effects model is that $\delta z_i$ does not have a true value but rather follows a random distribution with parameters that we must estimate. </p>
<p>The random effects term, $\delta z_i$:</p>
<ul>
<li>Is uncorrelated with $x_{it}$ and pooled OLS estimates of the model parameters will not be biased. </li>
<li>Impacts the covariance structure of the error term which implies that pooled OLS estimates of the model parameters will be inefficient and standard inference tools, like the t-stat, will not be correct.</li>
</ul>
<p>The random effects model should be estimated using feasible generalized least squares (FGLS). Using FGLS, the appropriate error structure, one which accounts for the individual-specific error terms, can be incorporated into the model. </p>
<p><b>What Is the Random Coefficients Model?</b></p><div id="attachment_21253" style="width: 610px" class="wp-caption aligncenter"><a href="https://www.aptech.com/wp-content/uploads/2019/11/panel-blog-random-coefficients-data-w-heteroscedasticity.jpg"><img aria-describedby="caption-attachment-21253" decoding="async" src="https://www.aptech.com/wp-content/uploads/2019/11/panel-blog-random-coefficients-data-w-heteroscedasticity.jpg" alt="Plot of random coefficients panel data, showing differing intercepts, slopes, and variances." width="600" height="300" class="size-full wp-image-21253" /></a><p id="caption-attachment-21253" class="wp-caption-text">Plot of random coefficients panel data, showing differing intercepts, slopes, and variances.</p></div><p>The <a href="https://github.com/aptech/gauss-panel-library" target="_blank" rel="noopener">panel data regressions</a> we’ve looked at so far have all assumed that the coefficients on regressors are the same across all individuals. The random coefficients model relaxes this assumption and introduces individual-specific effects through the coefficients, such that</p>
<p>$$y_{it} = \beta_i x_{it} + \alpha_i + \epsilon_{it}$$
$$y_{it} = (b_i + \beta)x_{it} + (\alpha_i+\alpha) + \epsilon_{it}$$
$$b_i \sim N(0, \tau_{i1}^2)$$
$$a_i \sim N(0, \tau_{i2}^2)$$</p>
<p>This model introduces both individual slope effects and allows for heteroscedasticity through the individual-specific $\tau_{i1}^2$ and $\tau_{i2}^2$.</p>
<p>This model can be estimated using <a href="https://www.aptech.com/blog/using-feasible-generalized-least-squares-to-improve-estimates/" target="_blank" rel="noopener">feasible generalized least squares (FGLS)</a> or <a href="https://www.aptech.com/blog/beginners-guide-to-maximum-likelihood-estimation-in-gauss/" target="_blank" rel="noopener">maximum likelihood estimation (MLE)</a>. </p>
<h3 id="two-way-individual-effects-models">Two-Way Individual Effects Models</h3>
<p>The two-way individual effects model allows the presence of both time-specific effects and individual-specific effects. </p>
<p>Starting from a simple linear model given by, </p>
<p>$$y_{it} = \alpha + \beta_{xit} + \epsilon_{it}$$</p>
<p>the two-way individual effects model can be represented by </p>
<p>$$y_{it} = \alpha + \beta_{xit} + \mu_i + \lambda_t + \epsilon_{it}$$</p>
<p>In this model, $\mu_i$, captures any unobservable individual-specific effects and $\lambda_t$ captures any unobservable time-specific effects. Note that the individual-specific effects, $\mu_i$, do not vary with time, while the time-specific effects, $\lambda_t$, do not vary across individuals.</p>
<p>In the special case that there are only two groups and two individuals this model is equivalent to the <a href="https://www.aptech.com/blog/introduction-to-difference-in-differences-estimation/" target="_blank" rel="noopener">difference-in-difference model</a>. However, if there are more than two time periods and/or individuals, alternative panel data models must be considered. </p>
<p><b>What Is the Two-Way Fixed Effects Model?</b><br />
The two-way fixed effects model:</p>
<ul>
<li>Assumes that both $\mu_i$ and $\lambda_t$ are unobservable, fixed effects that must be estimated.</li>
</ul>
<p>For data generated by this model:</p>
<ul>
<li>Pooled OLS estimates, which ignore $\mu_i$ and $\lambda_t$, will be biased and inconsistent. </li>
<li>One-way fixed effects estimates, which ignore $\lambda_t$, will be biased.</li>
</ul>
<p>Like the one-way fixed effects model, this model could be estimated by including dummy variables. However, in the two-way fixed effects model dummy variables must be included for both the time periods and the groups. </p>
<p>Under most circumstances, the number of dummy variables included in the two-way fixed effects model makes standard ordinary least squares estimation too computationally difficult. Instead, the two-way fixed effects model is estimated using a within-group estimator which removes the variation both within groups and within the time periods.</p>
<p><b>What Is the Two-Way Random Effects Model?</b><br />
The two-way random effects model:</p>
<ul>
<li>Occurs when both $\mu_i$ and $\lambda_t$ are unobservable, stochastic effects. </li>
<li>Assumes that $\mu_i$ and $\lambda_t$ are independently distributed and are uncorrelated with $x_{it}$.</li>
</ul>
<p>For data generated by this process:</p>
<ul>
<li>Pooled OLS estimates will be unbiased. However, the estimates will be inefficient and the associated standard errors and t-statistics will be biased. </li>
</ul>
<p>Like the one-way random effects model, the two-way random effects model can be estimated using feasible generalized least squares (FGLS) or maximum likelihood estimation (MLE). </p>
<p><b>Dynamic Panel Data Model</b><br />
A key component of pure time series models is the modeling of dynamics using lagged dependent variables. These lagged variables capture the autocorrelation between observations of the same dataset at different points in time. </p>
<p>Because panel datasets include a time series component, it is also important to address the possibility of autocorrelation in panel data. The <a href="https://ifs.org.uk/publications/dpd-gauss" target="_blank" rel="noopener">dynamic panel data model</a> adds dynamics to the panel data individual effects framework. </p>
<p>Consider an individual effects model which includes an AR(1) term</p>
<p>$$y_{it} = \delta y_{i,t-1} + \beta x_{it} + \epsilon_{it}$$</p>
<p>the error component includes one-way individual effects such that </p>
<p>$$\epsilon_{it} = \mu_i + \nu_{it}$$</p>
<p>where $\mu_i$ captures individual effects.</p>
<p>Introducing lagged dependent variables in the individual effects framework:</p>
<ul>
<li>Both $y_{it}$ and $y_{i,t-1}$ are functions of $\mu_i$, because $\mu_i$ is time-invariant. This implies that as a regressor, $y_{i,t-1}$ is correlated with the error term.</li>
<li><a href="https://www.aptech.com/resources/tutorials/econometrics/" target="_blank" rel="noopener">Ordinary least squares (OLS)</a> will lead to biased estimates because of the serial correlation. </li>
</ul>
<p>Dynamic panel data models are most commonly estimated using a <a href="https://www.aptech.com/resources/tutorials/gmm/introduction/" target="_blank" rel="noopener">generalized method of moments (GMM)</a> framework proposed by Arellano and Bond (1991).</p>
<h2 id="panel-data-and-stationarity">Panel Data and Stationarity</h2>
<p>In panel data that covers small time frames, there is little need to worry about stationarity. However, when panel data covers longer time frames, like is the case in many macroeconomic panel data series, the panel data must be <a href="https://www.aptech.com/why-gauss-for-unit-root-testing/" target="_blank" rel="noopener">tested for stationarity</a>.</p>
<p>Weak stationarity, required for many panel data modeling techniques, requires only that:</p>
<ul>
<li>A series has the same finite unconditional mean and finite unconditional variance at all time periods.</li>
<li>That the series autocovariances are independent of time.</li>
</ul>
<p>Nonstationary panel data series are any panel series that do not meet the conditions of a weakly stationary time series.</p>
<p>In part because of these considerations, a large field of research and literature surrounding panel data unit root tests has developed.</p>
<p>Testing for <a href="https://www.aptech.com/why-gauss-for-unit-root-testing/" target="_blank" rel="noopener">unit roots</a> in panel data requires more than just testing the individual cross sections for the presence of unit roots. <a href="https://www.aptech.com/blog/panel-data-structural-breaks-and-unit-root-testing/" target="_blank" rel="noopener">Panel data unit root tests</a> must:</p>
<ul>
<li>Allow for both the shared movements across groups and the individual-specific movements within groups.   </li>
<li>Use an appropriate asymptotic distribution based on how quickly the number of panels (N) and the number of time periods (T) grow relative to one another.  </li>
<li>Determine whether to assume for cross-sectional independence or to enforce cross-sectional dependence. </li>
</ul>
<h2 id="conclusion">Conclusion</h2>
<p>After today's blog, you should have an understanding of the fundamentals of panel data. We covered the basics of panel data including:</p>
<ul>
<li>The structure of panel data series.</li>
<li>Wide versus long panel data series.</li>
<li>One-way individual effects panel data models.</li>
<li>Two-way individual effects panel data models.</li>
<li>Dynamic panel data models.</li>
<li>Panel data series and stationarity.</li>
</ul>
<h3 id="further-suggested-reading">Further suggested reading:</h3>
<ol>
<li><a href="https://www.aptech.com/blog/panel-data-structural-breaks-and-unit-root-testing/" target="_blank" rel="noopener">Panel data, structural breaks and unit root testing</a></li>
<li><a href="https://www.aptech.com/blog/panel-data-basics-one-way-individual-effects/" target="_blank" rel="noopener">Panel Data Basics: One-way Individual Effects</a></li>
<li><a href="https://www.aptech.com/blog/how-to-aggregate-panel-data-in-gauss/" target="_blank" rel="noopener">How to Aggregate Panel Data in GAUSS</a></li>
<li><a href="https://www.aptech.com/blog/panel-data-stationarity-test-with-structural-breaks/" target="_blank" rel="noopener">Panel Data Stationarity Test With Structural Breaks</a></li>
<li><a href="https://www.aptech.com/blog/transforming-panel-data-to-long-form-in-gauss/" target="_blank" rel="noopener">Transforming Panel Data to Long Form in GAUSS</a></li>
<li><a href="https://www.aptech.com/blog/get-started-with-panel-data-in-gauss-video/" target="_blank" rel="noopener">Getting Started With Panel Data in GAUSS </a></li>
</ol>
<p>Ready to get more from your panel data with GAUSS? <a href="https://www.aptech.com/contact-us/" target="_blank" rel="noopener">Contact us today</a> to claim your free GAUSS demo copy.</p>
    <!-- MathJax configuration -->
    <style>
        .mjx-svg-href {
            fill: "inherit" !important;
            stroke: "inherit" !important;
        }
    </style>
    <script type="text/x-mathjax-config">
        MathJax.Hub.Config({ TeX: { equationNumbers: {autoNumber: "AMS"} } });
    </script>
    <script type="text/javascript">
window.MathJax = {
  tex2jax: {
    inlineMath: [ ['$','$'] ],
    displayMath: [ ['$$','$$'] ],
    processEscapes: true,
    processEnvironments: true
  },
  // Center justify equations in code and markdown cells. Elsewhere
  // we use CSS to left justify single line equations in code cells.
  displayAlign: 'center',
  "HTML-CSS": {
    styles: {'.MathJax_Display': {"margin": 0}},
    linebreaks: { automatic: false }
  },
  "SVG": {
    styles: {'.MathJax_SVG_Display': {"margin": 0}},
    linebreaks: { automatic: false }
  },
  showProcessingMessages: false,
  messageStyle: "none",
  menuSettings: { zoom: "Click" },
  AuthorInit: function() {
    MathJax.Hub.Register.StartupHook("End", function() {
            var timeout = false, // holder for timeout id
            delay = 250; // delay after event is "complete" to run callback
            var shrinkMath = function() {
              //var dispFormulas = document.getElementsByClassName("formula");
              var dispFormulas = document.getElementsByClassName("MathJax_SVG_Display");
              if (dispFormulas){
                // caculate relative size of indentation
                var contentTest = document.getElementsByTagName("body")[0];
                var nodesWidth = contentTest.offsetWidth;
                // if you have indentation
                var mathIndent = MathJax.Hub.config.displayIndent; //assuming px's
                var mathIndentValue = mathIndent.substring(0,mathIndent.length - 2);
                for (var i=0; i<dispFormulas.length; i++){
                  var dispFormula = dispFormulas[i];
                  var wrapper = dispFormula;
                  //var wrapper = dispFormula.getElementsByClassName("MathJax_Preview")[0].nextSibling;
                  var child = wrapper.firstChild;
                  wrapper.style.transformOrigin = "center"; //or top-left if you left-align your equations
                  var oldScale = child.style.transform;
                  //var newValue = Math.min(0.80*dispFormula.offsetWidth / child.offsetWidth,1.0).toFixed(2);
                  var newValue = Math.min(dispFormula.offsetWidth / child.offsetWidth,1.0).toFixed(2);
                  var newScale = "scale(" + newValue + ")";
                  if(newValue != "NaN" && !(newScale === oldScale)){
                    wrapper.style.transform = newScale;
                    wrapper.style["margin-left"]= Math.pow(newValue,4)*mathIndentValue + "px";
                    var wrapperStyle = window.getComputedStyle(wrapper);
                    var wrapperHeight = parseFloat(wrapperStyle.height);
                    wrapper.style.height = "" + (wrapperHeight * newValue) + "px";
                    if(newValue === "1.00"){
                      wrapper.style.cursor = "";
                      wrapper.style.height = "";
                    }
                    else {
                      wrapper.style.cursor = "zoom-in";
                    }
                  }

                }
            }
            };
            shrinkMath();
            window.addEventListener('resize', function() {
              clearTimeout(timeout);
              timeout = setTimeout(shrinkMath, delay);
            });
          });
  }
}
</script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.7/MathJax.js?config=TeX-AMS_SVG"></script>
]]></content:encoded>
					
					<wfw:commentRss>https://www.aptech.com/blog/introduction-to-the-fundamentals-of-panel-data/feed/</wfw:commentRss>
			<slash:comments>2</slash:comments>
		
		
			</item>
		<item>
		<title>How to Aggregate Panel Data in GAUSS</title>
		<link>https://www.aptech.com/blog/how-to-aggregate-panel-data-in-gauss/</link>
					<comments>https://www.aptech.com/blog/how-to-aggregate-panel-data-in-gauss/#respond</comments>
		
		<dc:creator><![CDATA[Eric]]></dc:creator>
		<pubDate>Sat, 23 Nov 2019 00:34:45 +0000</pubDate>
				<category><![CDATA[Econometrics]]></category>
		<category><![CDATA[Panel data]]></category>
		<category><![CDATA[Programming]]></category>
		<guid isPermaLink="false">https://www.aptech.com/?p=21062</guid>

					<description><![CDATA[The aggregate function, first available in <a href="https://www.aptech.com/blog/gauss-20-initial-release/">GAUSS version 20</a>, computes statistics within data groups. This is particularly useful for <a href="https://www.aptech.com/blog/introduction-to-the-fundamentals-of-panel-data/">panel data</a>. In today's blog, we take a closer look at aggregate.]]></description>
										<content:encoded><![CDATA[<h2 id="introduction">Introduction</h2>
<p>The <code>aggregate</code> function, first available in <a href="https://www.aptech.com/blog/gauss-20-initial-release/" target="_blank" rel="noopener">GAUSS version 20</a>, computes statistics within data groups. This is particularly useful for <a href="https://www.aptech.com/blog/introduction-to-the-fundamentals-of-panel-data/" target="_blank" rel="noopener">panel data</a>.</p>
<p>In today's blog, we take a closer look at <a href="https://docs.aptech.com/gauss/aggregate.html" target="_blank" rel="noopener"><code>aggregate</code></a>. We will:</p>
<ol>
<li>Introduce the basics of the <code>aggregate</code> function.</li>
<li>Explain how to use the <code>aggregate</code> function</li>
<li>Demonstrate a real-world application of the <code>aggregate</code> function using current account data from the International Monetary Fund.</li>
</ol>
<h2 id="the-gauss-aggregate-function">The GAUSS Aggregate Function</h2>
<p>The GAUSS <code>aggregate</code> function computes statistics within a group based upon a specified group identifier. The function supports a variety of GAUSS statistics including:</p>
<ul>
<li>mean</li>
<li>median</li>
<li>mode</li>
<li>min</li>
<li>max</li>
<li>sample standard deviation</li>
<li>sum</li>
<li>sample variance</li>
</ul>
<p>For example, consider a panel dataset which includes observed weights for three individuals across a 6-month time span:</p>
<table>
<thead>
<tr><th>Name</th><th>Jan. Weight</th><th>Feb. Weight</th><th>Mar. Weight</th><th>Apr. Weight</th><th>May Weight</th><th>June Weight</th></tr></thead>
<tbody>
<tr><td>Sarah</td><td>135</td><td>134</td><td>138</td><td>142</td><td>144</td><td>145</td></tr>
<tr><td>Tom</td><td>196</td><td>192</td><td>182</td><td>183</td><td>184</td><td>181</td></tr>
<tr><td>Nikki</td><td>143</td><td>144</td><td>146</td><td>147</td><td>145</td><td>143</td></tr>
</tbody>
</table>
<p>We can use the <code>aggregate</code> function to find the 6-month mean weights for Sarah, Tom and Nikki:</p>
<table>
<thead>
<tr><th>Name</th><th>Jan. Weight</th><th>Feb. Weight</th><th>Mar. Weight</th><th>Apr. Weight</th><th>May Weight</th><th>June Weight</th><th>Mean Weight</th></tr></thead>
<tbody>
<tr><td>Sarah</td><td>135</td><td>134</td><td>138</td><td>142</td><td>144</td><td>145</td><td style="text-align:center; background-color: #fde5d2" ">139.7</td></tr>
<tr><td>Tom</td><td>196</td><td>192</td><td>182</td><td>183</td><td>184</td><td>181</td><td style="text-align:center; background-color: #fde5d2" ">186.3</td></tr>
<tr><td>Nikki</td><td>143</td><td>144</td><td>146</td><td>147</td><td>145</td><td>143</td><td style="text-align:center; background-color: #fde5d2" ">144.7</td></tr>
</tbody>
</table>
<p>Alternatively, we could find the monthly standard deviation of the weights across Sarah, Tom and Nikki:</p>
<table>
<thead>
<tr><th>Name</th><th>Jan. Weight</th><th>Feb. Weight</th><th>Mar. Weight</th><th>Apr. Weight</th><th>May Weight</th><th>June Weight</th></tr></thead>
<tbody>
<tr><td>Sarah</td><td>135</td><td>134</td><td>138</td><td>142</td><td>144</td><td>145</td></tr>
<tr><td>Tom</td><td>196</td><td>192</td><td>182</td><td>183</td><td>184</td><td>181</td></tr>
<tr><td>Nikki</td><td>143</td><td>144</td><td>146</td><td>147</td><td>145</td><td>143</td></tr>
<tr style="text-align:center; background-color: #fde5d2" "><td>Monthly Std. Dev.</td><td>33.2</td><td>31.0</td><td>23.4</td><td>22.4</td><td>22.8</td><td>21.4</td></tr>
</tbody>
</table>
<h2 id="how-to-use-the-aggregate-function">How to Use The Aggregate Function</h2>
<p>The <code>aggregate</code> function takes two required inputs:</p>
<pre class="hljs-container hljs-container-solo"><code class="lang-gauss">x_a = aggregate(x, method);</code></pre>
<hr>
<dl>
<dt>x</dt>
<dd>NxK data matrix, must have group identifiers in the first column.</dd>
<dt>method</dt>
<dd>String, method to use. Valid methods include: <code>"mean"</code>, <code>"median"</code>, <code>"mode"</code>, <code>"max"</code>, <code>"min"</code>, <code>"sd"</code>, <code>"sum"</code>, <code>"variance"</code>.</dd>
</dl>
<hr>
<h3 id="the-input-data-matrix">The Input Data Matrix</h3>
<p>The <code>aggregate</code> function requires the data matrix input to:</p>
<ul>
<li>Have numerical group identifiers in the first column.</li>
<li>Be in stacked panel data format.</li>
</ul>
<p>Let's consider our example dataset from above. In order to use this data as an input to the GAUSS <code>aggregate</code> function we need to: </p>
<ul>
<li>Recode our group identifiers from names to numbers.</li>
<li>Stack our data into a pooled dataset.</li>
</ul>
<table>
<thead>
<tr><th>Name</th><th>Jan. Weight</th><th>Feb. Weight</th><th>Mar. Weight</th><th>Apr. Weight</th><th>May Weight</th><th>June Weight</th></tr></thead>
<tbody>
<tr><td>Sarah</td><td>135</td><td>134</td><td>138</td><td>142</td><td>144</td><td>145</td></tr>
<tr><td>Tom</td><td>196</td><td>192</td><td>182</td><td>183</td><td>184</td><td>181</td></tr>
<tr><td>Nikki</td><td>143</td><td>144</td><td>146</td><td>147</td><td>145</td><td>143</td></tr>
</tbody>
</table>
<p>$$\text{Sarah} \rightarrow 1$$
$$\text{Tom} \rightarrow 2$$
$$\text{Nikki} \rightarrow 3$$</p>
<p>$$\Downarrow$$</p>
<table>
<thead>
<tr><th>Group</th><th>Jan. Weight</th><th>Feb. Weight</th><th>Mar. Weight</th><th>Apr. Weight</th><th>May Weight</th><th>June Weight</th></tr></thead>
<tbody>
<tr><td>1</td><td>135</td><td>134</td><td>138</td><td>142</td><td>144</td><td>145</td></tr>
<tr><td>2</td><td>196</td><td>192</td><td>182</td><td>183</td><td>184</td><td>181</td></tr>
<tr><td>3</td><td>143</td><td>144</td><td>146</td><td>147</td><td>145</td><td>143</td></tr>
</tbody>
</table>
<p>$$\Downarrow$$</p>
<table>
<thead>
<tr><th>Group</th><th>Month</th><th>Weight</th></tr></thead>
<tbody>
<tr><td>1</td><td>1</td><td>135</td></tr>
<tr><td>1</td><td>2</td><td>134</td></tr>
<tr><td>1</td><td>3</td><td>138</td></tr>
<tr><td>1</td><td>4</td><td>142</td></tr>
<tr><td>1</td><td>5</td><td>144</td></tr>
<tr><td>1</td><td>6</td><td>145</td></tr>
<tr><td>2</td><td>1</td><td>196</td></tr>
<tr><td>2</td><td>2</td><td>192</td></tr>
<tr><td>⁞</td><td>⁞</td><td>⁞</td></tr>
<tr><td>3</td><td>6</td><td>143</td></tr>
</tbody>
</table>
<h3 id="the-method-input">The Method Input</h3>
<p>The <em>method</em> input into the <code>aggregate</code> function should always be a string indicating which statistic you wish to compute. </p>
<p>Each method works on groups within the panel the same way the analogous pooled data function would work, including its handling of missing values.</p>
<table>
<thead>
<tr><th>Method</th><th>Pooled Function</th></tr></thead>
<tbody>
<tr><td>mean</td><td><code>meanc</code></td></tr>
<tr><td>median</td><td><code>median</code></td></tr>
<tr><td>mode</td><td><code>modec</code></td></tr>
<tr><td>max</td><td><code>maxc</code></td></tr>
<tr><td>min</td><td><code>minc</code></td></tr>
<tr><td>sum</td><td><code>sumc</code></td></tr>
<tr><td>sd</td><td><code>stdc</code></td></tr>
<tr><td>variance</td><td><code>varCovXS</code></td></tr>
</tbody>
</table>
<h3 id="example-of-how-to-use-aggregate">Example of How to Use Aggregate</h3>
<p>Let's use <code>aggregate</code> to find the means by group for weight data:</p>
<pre class="hljs-container hljs-container-solo"><code class="lang-gauss">weights = { 1 1 135,
            1 2 134,
            1 3 138,
            1 4 142,
            1 5 144,
            1 6 145,
            2 1 196,
            2 2 192,
            2 3 182,
            2 4 183,
            2 5 184,
            2 6 181,
            3 1 143,
            3 2 144,
            3 3 146,
            3 4 147,
            3 5 145,
            3 6 143 };

/*
** Find the mean by person.
** We will use the first column
** as the group indicator and will find
** the mean of the weights.
*/
print aggregate(weights[., 1 3], "mean");</code></pre>
<p>This prints the group means to the output window:</p>
<pre class="hljs-container hljs-container-solo"><code class="lang-gauss">1   139.67 
2   186.33 
3   144.67</code></pre>
<p>We can also use the month identifiers to find the sample standard deviation by month:</p>
<pre class="hljs-container hljs-container-solo"><code class="lang-gauss">/*
** Find the standard deviation by month.
** We will use the second column of weights
** as the group indicator and will find
** the standard deviation of the weights.
*/
print aggregate(weights[., 2 3], "sd");</code></pre>
<p>Now the standard deviations, along with their associated months will be printed to the output window:</p>
<pre class="hljs-container hljs-container-solo"><code class="lang-gauss">1   33.151 
2   31.005 
3   23.438 
4   22.368 
5   22.811 
6   21.385</code></pre>
<h2 id="using-aggregate-to-examine-trends-in-current-account-balances">Using Aggregate to Examine Trends in Current Account Balances</h2>
<p>Our simple example dataset is useful for demonstrating the basics of the <code>aggregate</code> function. However, a real-world panel dataset better demonstrates its true power. In this section, we will use <code>aggregate</code> to examine some of the trends in international current account balances.</p>
<h3 id="the-data">The Data</h3>
<p>We will use current account balance measured as a percentage of GDP. This unbalanced panel data is a modified version of a dataset from the International Monetary Fund and spans 1953-Q3 to 2019-Q4 and includes a total of 46 countries, across 5 different regions.</p>
<p>It contains the following variables:</p>
<table>
<thead>
<tr>
<th>Variable name</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>Country</td>
<td>String, name of the country.</td>
</tr>
<tr>
<td>Country ID</td>
<td>Integer country identifier.</td>
</tr>
<tr>
<td>World Region</td>
<td>String, name of the corresponding world region.</td>
</tr>
<tr>
<td>Region ID</td>
<td>Integer world region identifier.</td>
</tr>
<tr>
<td>Time</td>
<td>String, the date of the observation.</td>
</tr>
<tr>
<td>CAB</td>
<td>Decimal numeric, the Current Account Balance.</td>
</tr>
</tbody>
</table>
<h3 id="mean-and-median-current-account-balances-by-country">Mean and Median Current Account Balances By Country</h3>
<p>We first examine variations in the mean and median current account balances across countries. Using <code>aggregate</code> we find the mean and median current account balances for each country in the panel across all observations.</p>
<pre class="hljs-container hljs-container-solo"><code class="lang-gauss">/*
** Load the 'Country ID' and 'CAB' (Current Account Balance) variables.
** Notice that the grouping variable will be in the first column of 'X'.
*/
X = loadd("imf_cab_mod.xlsx", "Country Id + CAB");

// Compute mean and median current
// account balances by Country ID
mean_cab_cid = aggregate(X, "mean");
median_cab_cid = aggregate(X, "median");</code></pre>
<p>After the above code <code>mean_cab_cid</code> and <code>median_cab_cid</code> will be both be $46\times2$ matrices. Each element in the first column will be a unique country ID. The corresponding element in the second column will be the average (mean or median) current account balance for that country.</p>
<p>We include a graph of this data below, where we see that Germany leads the pack with the highest average current account balance while Finland has the lowest average current account balances. </p>
<p><a href="https://www.aptech.com/wp-content/uploads/2019/11/current-acct-bal-by-ctry.png"><img src="https://www.aptech.com/wp-content/uploads/2019/11/current-acct-bal-by-ctry.png" alt="" width="600" height="1062" class="aligncenter size-full wp-image-21194" /></a></p>
<h3 id="mean-and-median-current-account-balances-by-region">Mean and Median Current Account Balances By Region</h3>
<p>We can similarly consider the mean and median current account balances across geographical regions with the code below.</p>
<pre class="hljs-container hljs-container-solo"><code class="lang-gauss">/*
** Load the 'Region ID' and 'CAB' (Current Account Balance) variables.
** Notice that the grouping variable will be in the first column of 'X'.
*/
X = loadd("imf_cab_mod.xlsx", "Region ID + CAB");

// Compute mean and median current
// account balances by world region
mean_cab_wreg = aggregate(X, "mean");
median_cab_wreg = aggregate(X, "median");</code></pre>
<p><code>mean_cab_wreg</code> and <code>median_cab_wreg</code> will be two column matrices with unique world region IDs in the first column and the corresponding statistics in the second column.</p>
<p><a href="https://www.aptech.com/wp-content/uploads/2019/11/current-act-bal-by-region.png"><img src="https://www.aptech.com/wp-content/uploads/2019/11/current-act-bal-by-region.png" alt="" width="600" height="300" class="aligncenter size-full wp-image-21195" /></a></p>
<h3 id="mean-and-median-current-account-balances-time-series">Mean and Median Current Account Balances Time Series</h3>
<p>Finally, we consider how the mean and median current account balances vary across time in the <a href="https://www.aptech.com/resources/tutorials/time-series-plots/" target="_blank" rel="noopener">time series plot</a> below.</p>
<pre class="hljs-container hljs-container-solo"><code class="lang-gauss">/*
** Load the 'Time' and 'CAB' (Current Account Balance) variables.
** Notice that the grouping variable will be in the first column of 'X'.
** Wrapping 'Time' in 'date($)' tells GAUSS that 'Time' is a string
** variable that we want GAUSS to convert to a date.
*/
X = loadd("imf_cab_mod.xlsx", "date($Time) + CAB");

mean_cab_date = aggregate(X, "mean");
median_cab_date = aggregate(X, "median");</code></pre>
<p>This time the first column of our resulting matrices, <code>mean_cab_date</code> and <code>median_cab_date</code>, will contain each unique date from our dataset. The second column will contain the statistic computed for each unique date.</p>
<hr>
<center>Interested in learning more about loading dates in GAUSS? <br><a href="https://www.aptech.com/blog/reading-dates-and-times-in-gauss/" target="_blank" rel="noopener">Check out this tutorial to learn more</a>.</center>
<hr>
<p>Below is a graph of the Current Account Balance data grouped by quarter.</p>
<p><a href="https://www.aptech.com/wp-content/uploads/2019/11/current-acct-bal-ts-1.jpg"><img src="https://www.aptech.com/wp-content/uploads/2019/11/current-acct-bal-ts-1.jpg" alt="" width="600" height="300" class="aligncenter size-full wp-image-21198" /></a></p>
<h2 id="conclusion">Conclusion</h2>
<p>In today's blog, we examined the fundaments of the <code>aggregate</code> procedure. After reading you should have a better understanding of:</p>
<ol>
<li>The basics of the <code>aggregate</code> function.</li>
<li>How to use the <code>aggregate</code> function.</li>
<li>How to examine trends in real-world panel data using <code>aggregate</code>.</li>
</ol>
<h3 id="further-reading">Further Reading</h3>
<ol>
<li><a href="https://www.aptech.com/blog/panel-data-structural-breaks-and-unit-root-testing/" target="_blank" rel="noopener">Panel data, structural breaks and unit root testing</a></li>
<li><a href="https://www.aptech.com/blog/panel-data-basics-one-way-individual-effects/" target="_blank" rel="noopener">Panel Data Basics: One-way Individual Effects</a></li>
<li><a href="https://www.aptech.com/blog/introduction-to-the-fundamentals-of-panel-data/" target="_blank" rel="noopener">Introduction to the Fundamentals of Panel Data</a></li>
<li><a href="https://www.aptech.com/blog/panel-data-stationarity-test-with-structural-breaks/" target="_blank" rel="noopener">Panel Data Stationarity Test With Structural Breaks</a></li>
<li><a href="https://www.aptech.com/blog/transforming-panel-data-to-long-form-in-gauss/" target="_blank" rel="noopener">Transforming Panel Data to Long Form in GAUSS</a></li>
</ol>
    <!-- MathJax configuration -->
    <style>
        .mjx-svg-href {
            fill: "inherit" !important;
            stroke: "inherit" !important;
        }
    </style>
    <script type="text/x-mathjax-config">
        MathJax.Hub.Config({ TeX: { equationNumbers: {autoNumber: "AMS"} } });
    </script>
    <script type="text/javascript">
window.MathJax = {
  tex2jax: {
    inlineMath: [ ['$','$'] ],
    displayMath: [ ['$$','$$'] ],
    processEscapes: true,
    processEnvironments: true
  },
  // Center justify equations in code and markdown cells. Elsewhere
  // we use CSS to left justify single line equations in code cells.
  displayAlign: 'center',
  "HTML-CSS": {
    styles: {'.MathJax_Display': {"margin": 0}},
    linebreaks: { automatic: false }
  },
  "SVG": {
    styles: {'.MathJax_SVG_Display': {"margin": 0}},
    linebreaks: { automatic: false }
  },
  showProcessingMessages: false,
  messageStyle: "none",
  menuSettings: { zoom: "Click" },
  AuthorInit: function() {
    MathJax.Hub.Register.StartupHook("End", function() {
            var timeout = false, // holder for timeout id
            delay = 250; // delay after event is "complete" to run callback
            var shrinkMath = function() {
              //var dispFormulas = document.getElementsByClassName("formula");
              var dispFormulas = document.getElementsByClassName("MathJax_SVG_Display");
              if (dispFormulas){
                // caculate relative size of indentation
                var contentTest = document.getElementsByTagName("body")[0];
                var nodesWidth = contentTest.offsetWidth;
                // if you have indentation
                var mathIndent = MathJax.Hub.config.displayIndent; //assuming px's
                var mathIndentValue = mathIndent.substring(0,mathIndent.length - 2);
                for (var i=0; i<dispFormulas.length; i++){
                  var dispFormula = dispFormulas[i];
                  var wrapper = dispFormula;
                  //var wrapper = dispFormula.getElementsByClassName("MathJax_Preview")[0].nextSibling;
                  var child = wrapper.firstChild;
                  wrapper.style.transformOrigin = "center"; //or top-left if you left-align your equations
                  var oldScale = child.style.transform;
                  //var newValue = Math.min(0.80*dispFormula.offsetWidth / child.offsetWidth,1.0).toFixed(2);
                  var newValue = Math.min(dispFormula.offsetWidth / child.offsetWidth,1.0).toFixed(2);
                  var newScale = "scale(" + newValue + ")";
                  if(newValue != "NaN" && !(newScale === oldScale)){
                    wrapper.style.transform = newScale;
                    wrapper.style["margin-left"]= Math.pow(newValue,4)*mathIndentValue + "px";
                    var wrapperStyle = window.getComputedStyle(wrapper);
                    var wrapperHeight = parseFloat(wrapperStyle.height);
                    wrapper.style.height = "" + (wrapperHeight * newValue) + "px";
                    if(newValue === "1.00"){
                      wrapper.style.cursor = "";
                      wrapper.style.height = "";
                    }
                    else {
                      wrapper.style.cursor = "zoom-in";
                    }
                  }

                }
            }
            };
            shrinkMath();
            window.addEventListener('resize', function() {
              clearTimeout(timeout);
              timeout = setTimeout(shrinkMath, delay);
            });
          });
  }
}
</script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.7/MathJax.js?config=TeX-AMS_SVG"></script>
]]></content:encoded>
					
					<wfw:commentRss>https://www.aptech.com/blog/how-to-aggregate-panel-data-in-gauss/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
		<item>
		<title>Panel Data Basics: One-way Individual Effects</title>
		<link>https://www.aptech.com/blog/panel-data-basics-one-way-individual-effects/</link>
					<comments>https://www.aptech.com/blog/panel-data-basics-one-way-individual-effects/#respond</comments>
		
		<dc:creator><![CDATA[Eric]]></dc:creator>
		<pubDate>Mon, 15 Apr 2019 02:54:59 +0000</pubDate>
				<category><![CDATA[Econometrics]]></category>
		<category><![CDATA[Panel data]]></category>
		<guid isPermaLink="false">https://www.aptech.com/?p=20046</guid>

					<description><![CDATA[In this blog, we examine one of the fundamentals of <a href="https://www.aptech.com/blog/introduction-to-the-fundamentals-of-panel-data/">panel data</a> analysis, the one-way error component model. We cover the theoretical background of the one-way error component model, we examine the fixed-effects and random-effects models, and provide an empirical example of both.]]></description>
										<content:encoded><![CDATA[<h3 id="introduction">Introduction</h3>
<p>In this blog, we examine one of the fundamentals of <a href="https://www.aptech.com/blog/introduction-to-the-fundamentals-of-panel-data/" target="_blank" rel="noopener">panel data</a> analysis, the one-way error component model. Today we will:</p>
<ul>
<li>Explain the theoretical one-way error component model.</li>
<li>Consider fixed effects vs. random effects. </li>
<li>Estimate models using an empirical example. </li>
</ul>
<h2 id="the-theoretical-one-way-error-component-model">The theoretical one-way error component model</h2>
<p>The one-way error-component model is a panel data model which allows for individual-specific or temporal-specific error components</p>
<p>$$ \begin{equation}y_{it} = \alpha + X_{it} \beta + u_{it} \label{OWEM}\end{equation}$$
$$ u_{it} = \mu_{i} + \nu_{it} $$</p>
<p>where the subscript <i>i</i> indicates cross-sections of households, individuals, firms, countries, etc. and the subscript <i>t</i> indicates time periods. </p>
<p>In this model, the individual-specific error component, $\mu_{i}$, captures any unobserved effects that are different across individuals but fixed across time. </p>
<table>
<colgroup>
       <col span="1" style="width: 30%;">
       <col span="1" style="width: 70%;">
    </colgroup>
<tr>
<th colspan="2">The one-way error component model</th>
</tr>
<tr>
<td style="padding-left: 10px"><b>$\alpha$</b></td><td>Variable of interest which measures an intercept that is constant across all individuals and time periods.</td>
</tr>
<tr>
<td style="padding-left: 10px"><b>$\beta$</b></td><td>Variable of interest which measures the effect of <i>x</i> on <i>y</i>. It is constant across all individuals and time periods.</td>
</tr>
<tr>
<td style="padding-left: 10px"><b>$\mu_i$</b></td><td>Individual-specific variation in <i>y</i> which stays constant across time for each individual.<br>In the <b>fixed effects</b> model this is an individual-specific effect to be estimated.<br>In the <b>random effects</b> model this follows a random distribution with parameters that must be estimated.</td>
</tr>
<tr>
<td style="padding-left: 10px"><b>$\nu_{it}$</b></td><td>Usual stochastic regression disturbance which varies across time and individuals.</td>
</tr>
</table>
<h2 id="fixed-effects-vs-random-effects">Fixed effects vs. random effects</h2>
<p>The two most common approaches to modeling individual-specific error components are the fixed effects model and the random effects model. </p>
<p>The key difference between these two approaches is how we believe the individual error component behaves.</p>
<h3 id="the-fixed-effects-model">The fixed effects model</h3>
<p>In the fixed effects model the individual error component:</p>
<ul>
<li>Can be thought of as an individual-specific intercept term. </li>
<li>Captures any omitted variables that are not included in the regression.</li>
<li>Is correlated with other variables included in the model. </li>
</ul>
<p>Given these assumptions, the fixed effects model can be thought of as a pooled OLS model with individual specific intercepts:</p>
<p>$$\begin{equation}y_{it} = \delta_{i} + X_{it} \beta  + \nu_{it}\label{FEM}\end{equation}$$</p>
<p>The intercept term, $\delta_i$, varies across individuals but is constant across time for each individual. This term is composed of the constant intercept term, $\alpha$, and the individual-specific error terms, $\mu_i$. </p>
<p>The distinguishing feature of the fixed effects model is that $\delta_i$ has a true, but unobservable, effect which we must estimate. </p>
<h3 id="the-random-effects-model">The random effects model</h3>
<p>In the random effects model the individual-specific error component, $\mu_i$:</p>
<ul>
<li>Is distributed randomly and is independent of $\nu_{it}$.</li>
<li>Occurs in cases where individuals are drawn randomly from a large population, such as household studies (Baltagi, 2008).</li>
<li>Is assumed to be uncorrelated with all other variables in the model.</li>
<li>Random effects impact our model through the covariance structure of the error term. </li>
</ul>
<p>For example, consider the total error disturbance in the model, $ u_{it} = \mu_{i} + \nu_{it} $.  The covariance of the error at time <i>t</i> and time <i>s</i> depends on the variance of both $\mu_{i}$ and $\nu_{it}:$ </p>
<p>$$\begin{equation}cov(u_{it}, u_{is}) = \left\{ \begin{array}{ll} \sigma_{\mu}^2 & \text{for } t \neq s \\ \sigma_{\mu}^2 + \sigma_{\nu}^2   & \text{for } t = s \\ \end{array} \right. \label{REM}\end{equation} $$</p>
<p>The distinguishing feature of the random effects model is that $\mu_i$ does not have a true value but rather follows a random distribution with parameters that we must estimate. </p>
<h2 id="estimation">Estimation</h2>
<h3 id="the-fixed-effects-model-1">The fixed effects model</h3>
<p>In the fixed effects model, the individual effects introduce an endogeneity that will result in biased estimates if not properly accounted for. </p>
<p>Fortunately, we can make consistent estimates using one of three estimation techniques:</p>
<ol>
<li>Within-group estimation</li>
<li>First differences estimation</li>
<li>Least squares dummy variable (LSDV) estimation</li>
</ol>
<p>The first two of these techniques focuses on eliminating the individual effects before estimation. The LSDV method directly incorporates these effects using dummy variables.</p>
<div style="overflow: scroll">
<table>
<tr>
<th></th><th>Within-group estimator</th><th>LSDV estimator</th><th>First differences estimator</th>
</tr>
<tr>
<td><b>Data<br> transformation</b></td><td>Demean the data.</td><td>Use dummy variables.</td><td>Difference the data.</td>
</tr>
<td><b>Regression equation</b></td><td>$$\widetilde{Y_i} = \widetilde{X_i} \beta_{fe} + \widetilde{\nu_i} $$</td><td>$$Y_{it} = X_{it} \beta_{fe} +\\ \alpha D_{i} + \nu_{it}$$</td><td>$$\Delta{Y}_{it} = \Delta{X}_{it} \beta_{fe} + \Delta{\nu}_{it} $$</td>

</table>
</div>
<p><strong>Let's consider an example panel dataset</strong> with three individuals and three time periods shown in the table below. </p>
<table style="width: 100%;">
 <colgroup>
       <col span="1" style="width: 10%;">
       <col span="1" style="width: 10%;">
       <col span="1" style="width: 20%;">
       <col span="1" style="width: 20%;">
<col span="1" style="width: 20%;">
<col span="1" style="width: 20%;">
    </colgroup>
<tr>
<th>Individual</th><th>Time Period</th><th>Y<sub>it</sub></th><th>Within Group Ave. Y<sub>i</sub></th><th>X<sub>it</sub></th><th>Within Group Ave. X<sub>i</sub></th>
</tr>
 <tr>
<td style="text-align:center">1</td><td style="text-align:center">1</td><td style="text-align:center">3.901</td><td style="text-align:center">2.744</td><td style="text-align:center">0.978</td><td style="text-align:center">1.174</td>
</tr>
<tr>
<td style="text-align:center">1</td><td style="text-align:center">2</td><td style="text-align:center">2.345</td><td style="text-align:center">2.744</td><td style="text-align:center">1.798</td><td style="text-align:center">1.174</td>
</tr>
<tr>
<td style="text-align:center">1</td><td style="text-align:center">3</td><td style="text-align:center">1.987</td><td style="text-align:center">2.744</td><td style="text-align:center">0.745</td><td style="text-align:center">1.174</td>
</tr>

 <tr style="background-color: #F5F5F5;">
<td style="text-align:center">2</td><td style="text-align:center">1</td><td style="text-align:center">1.250</td><td style="text-align:center">1.715</td><td style="text-align:center">1.652</td><td style="text-align:center">1.425</td>
</tr>
<tr style="background-color: #F5F5F5;">
<td style="text-align:center">2</td><td style="text-align:center">2</td><td style="text-align:center">0.654</td><td style="text-align:center">1.715</td><td style="text-align:center">0.438</td><td style="text-align:center">1.425</td>
</tr>
<tr style="background-color: #F5F5F5;">
<td style="text-align:center">2</td><td style="text-align:center">3</td><td style="text-align:center">3.240</td><td style="text-align:center">1.715</td><td style="text-align:center">2.185</td><td style="text-align:center">1.425</td>
</tr>
 <tr>
<td style="text-align:center">3</td><td style="text-align:center">1</td><td style="text-align:center">0.901</td><td style="text-align:center">2.077</td><td style="text-align:center">2.119</td><td style="text-align:center">1.653</td>
</tr>
<tr>
<td style="text-align:center">3</td><td style="text-align:center">2</td><td style="text-align:center">1.341</td><td style="text-align:center">2.077</td><td style="text-align:center">1.516</td><td style="text-align:center">1.653</td>
</tr>
<tr>
<td style="text-align:center">3</td><td style="text-align:center">3</td><td style="text-align:center">3.989</td><td style="text-align:center">2.077</td><td style="text-align:center">1.324</td><td style="text-align:center">1.653</td>
</tr>
</table>
<p><strong> Example within-group estimation </strong><br />
We will estimate the fixed effects model using the within-group method. This can be done in three steps:</p>
<ol>
<li>Find the within-subject means. </li>
<li>Demean the dependent and independent variables using the within-subject means.</li>
<li>Run a linear regression using the demeaned variables.  </li>
</ol>
<p><strong> Finding the within-subject means  </strong><br />
To find the within-subject mean of Y for individual one we compute:</p>
<p>$$ \bar{Y_{1}} = \frac{(3.901 + 2.345 + 1.987)}{3} = 2.7443 .$$</p>
<p>We can find the within-subject means using the <code>withinMeans</code> procedure from the <a href="https://github.com/aptech/gauss-panel-library" target="_blank" rel="noopener">pdlib</a>. The <code>withinMeans</code> procedure requires two inputs:</p>
<hr />
<dl>
<dt>grps</dt>
<dd>(T*N) x 1 matrix,  group identifier.</dd>
<dt>data</dt>
<dd>(T*N) x k,  panel data.</dd>
</dl>
<hr />
<div class="alert alert-info" role="alert">The pdlib library is available for free and can be directly installed using the <a href="https://www.aptech.com/blog/gauss-package-manager-basics/" target="_blank" rel="noopener">GAUSS Package Manager</a>.</div>
<p>Using our sample data stored in the GAUSS data file <a href="https://github.com/aptech/gauss_blog/blob/master/econometrics/individual-effects-4.15.19/simple_data.dat" target="_blank" rel="noopener">simple_data.dat</a>: </p>
<pre class="hljs-container hljs-container-solo"><code class="lang-gauss">// Load data
data = loadd("simple_data.dat");

// Assign groups variable
grps = data[., 1];

// Assign y~x matrix
reg_data = data[.,3:4];

// Find group means
grp_means = withinMeans(grp, reg_data);

print "Group means for Y and X:";
grp_means;</code></pre>
<p>Our output reads:</p>
<pre>Group means for Y and X:

 2.7443  1.1737
 1.7147  1.4250 </pre>
<p><strong> Demeaning the data </strong><br />
The next step is to demean the data. This removes any time-invariant effects. After finding the within-subject means, the data is demeaned:   </p>
<p>$$ \widetilde{Y_1} = Y_{1t} - \overline{Y}_1 =\\ 3.901 - 2.744 = 1.157,\\ 2.345 - 2.744 = -0.399,\\ 1.987 - 2.744 = -0.757 .$$</p>
<p>In <strong>GAUSS</strong> we can demean data using the <code>demeanData</code> procedure from the pdlib library. The <code>demeanData</code> procedure requires two inputs:</p>
<hr />
<dl>
<dt>grps</dt>
<dd>(T*N) x 1 matrix,  group identifier.</dd>
<dt>data</dt>
<dd>(T*N) x k,  panel data.</dd>
</dl>
<hr />
<p>The <code>demeanData</code> procedure internally computes the within-subject means and requires just the the <code>reg_data</code> and <code>grps</code> variables that we created in the first step: </p>
<pre class="hljs-container hljs-container-solo"><code class="lang-gauss">// Remove time-invariant group means
data_tilde = demeanData(grps, reg_data);

print "Demeaned data:";
data_tilde;
print;</code></pre>
<p>Our demeaned data is printed in the output:</p>
<pre>Demeaned data:

 1.1567 -0.1957
-0.3993  0.6243
-0.7573 -0.4287
-0.4647  0.2270
-1.0607 -0.9870
 1.5253  0.7600
-1.1760  0.4660
-0.7360 -0.1370
 1.9120 -0.3290 </pre>
<p><strong> Performing the regression </strong><br />
Once we have transformed our <em>x</em> and <em>y</em> data we are ready to estimate the parameters of the fixed effects regression model:</p>
<p>$$\widetilde{Y_i} = \widetilde{X_i} \beta_{fe} + \widetilde{\nu_i} $$</p>
<p>where </p>
<p>$$\widehat{\beta}_{fe} = (\widetilde{X_i}'\widetilde{X_i})^{-1}(\widetilde{X_i}'\widetilde{Y_i}) .$$</p>
<p>Using the data we previously demeaned:</p>
<pre class="hljs-container hljs-container-solo"><code class="lang-gauss">// Extract variables
y_tilde = data_tilde[., 1];
x_tilde = data_tilde[., 2];

// Regress independent on dependent variables
coeff = inv(x_tilde'x_tilde)*(x_tilde'y_tilde);

// Print the fixed effects coefficient
print "Fixed effects coefficient:";
coeff;</code></pre>
<p>The result reads:</p>
<pre>Fixed effects coefficient:
 0.3413 </pre>
<p><strong> Using the fixedEffects procedure </strong><br />
As an alternative to computing these three steps separately, we can use the <code>fixedEffects</code> procedure from the GAUSS panel data library, <code>pdlib</code>. This procedure runs all three steps in a single call. The <code>fixedEffects</code> procedure takes four inputs:</p>
<hr />
<dl>
<dt>y</dt>
<dd>(T*N) x 1 matrix,  the panel of stacked dependent variables.</dd>
<dt>x</dt>
<dd>(T*N) x k matrix,  the panel of stacked independent variables.</dd>
<dt>grps</dt>
<dd>(T*N) x 1 matrix,  group identifier.</dd>
<dt>robust</dt>
<dd>Scalar,  an indicator variable of whether to use robust standard errors.</dd>
</dl>
<hr />
<pre class="hljs-container hljs-container-solo"><code class="lang-gauss">// Use fixedEffects procedure
call fixedEffects(reg_data[.,1], reg_data[.,2], grps, 1);</code></pre>
<p>This prints:</p>
<pre>------------------- FIXED EFFECTS (WITHIN) RESULTS -------------------

Observations          :  9
Number of Groups      :  3
Degrees of freedom    :  2
R-squared             :  0.026
Adj. R-squared        :  -0.558
Residual SS           :  11.021
Std error of est      :  1.485
Total SS (corrected)  :  11.319
F                     =  0.054        with 1,2 degrees of freedom
P-value               =  0.838

Variable            Coef.       Std. Error       t-Stat       P-Value
----------------------------------------------------------------------
X1                0.341276       1.011041       0.337549       0.768</pre>
<h3 id="the-random-effects-model-1">The random effects model</h3>
<p>The covariance structure of the random effects model means that pooled OLS will result in inefficient estimates. Instead, the random effects model is estimated using pooled <a href="https://www.aptech.com/blog/using-feasible-generalized-least-squares-to-improve-estimates/" target="_blank" rel="noopener">feasible generalized least squares</a>. </p>
<p>The pooled FGLS method estimates the model </p>
<p>$$\widetilde{Y_i} = \widetilde{W_i} \delta_{re} + \widetilde{\epsilon_i}$$</p>
<p>where the data is transformed using $\Omega = E[\epsilon_i \epsilon_i']$ </p>
<p>$$\widetilde{Y_i} = \Omega^{-\frac{1}{2}}Y_{i},$$
$$\widetilde{W_i} = \Omega^{-\frac{1}{2}}W_{i},$$
$$\widetilde{\epsilon_i} = \Omega^{-\frac{1}{2}}\epsilon_{i},$$</p>
<p>and</p>
<p>$$W_i = [1, X_i],$$
$$\delta = [\alpha, \beta']',$$
$$\epsilon_i = \mu_i i_T + \nu_i .$$</p>
<p>The most difficult part of estimating this model is estimating $\Omega$ and there are a number of different proposed methods. </p>
<p><strong> Example random effects estimation </strong><br />
One of the most common approaches for estimating the random effects model:</p>
<ol>
<li>Estimates the between-group regression to obtain $\sigma_u^2$.</li>
<li>Estimates the within-group regression to obtain $\sigma_{\nu}^2$.</li>
<li>Transforms the data using $\sigma_u^2$ and $\sigma_{\nu}^2$.  </li>
<li>Finds the pooled OLS estimator using the transformed data. </li>
</ol>
<p>We can perform these steps in one procedure call using the <code>randomEffects</code> procedure in <code>pdlib</code> GAUSS library.</p>
<p><strong> Using the randomEffects procedure  </strong><br />
The <code>randomEffects</code> procedure takes four inputs:</p>
<hr />
<dl>
<dt>y</dt>
<dd>(T*N) x 1 matrix,  the panel of stacked dependent variables.</dd>
<dt>x</dt>
<dd>(T*N) x k matrix,  the panel of stacked independent variables.</dd>
<dt>grps</dt>
<dd>(T*N) x 1 matrix,  group identifier.</dd>
<dt>robust</dt>
<dd>Scalar,  an indicator variable of whether to use robust standard errors.</dd>
</dl>
<hr />
<p>Continuing with our fixed effects example, we will use our sample data stored in the GAUSS data file simple_data.dat.</p>
<pre class="hljs-container hljs-container-solo"><code class="lang-gauss">// Use randomEffects procedure
call randomEffects(reg_data[., 1], reg_data[., 2], grps, 1);</code></pre>
<pre>---------------------- GLS RANDOM EFFECTS RESULTS  ----------------------

Observations          :  9
Number of Groups      :  3
Degrees of freedom    :  2
R-squared             :  0.004
Adj. R-squared        :  -2.985
Residual SS           :  12.907
Std error of est      :  1.358
Total SS (corrected)  :  12.956
F                     =  3.314        with 2,2 degrees of freedom
P-value               =  0.232<br />
Variable            Coef.       Std. Error       t-Stat       P-Value
----------------------------------------------------------------------
CONSTANT          1.994513       1.720996       1.158930       0.366
X1                0.129940       1.053423       0.123350       0.913</pre>
<h2 id="conclusion">Conclusion</h2>
<p>In today's blog we have covered the fundamentals of the individual error component models:</p>
<ul>
<li>The theoretical one-way error component model.</li>
<li>Fixed effects vs. random effects. </li>
<li>Estimating fixed effects and random effects. </li>
</ul>
<p>The code and data for this blog can be found at our Aptech Blog Github <a href="https://github.com/aptech/gauss_blog/tree/master/econometrics/individual-effects-4.15.19" target="_blank" rel="noopener">code repository</a>.</p>
<h3 id="further-reading">Further Reading</h3>
<ol>
<li><a href="https://www.aptech.com/blog/get-started-with-panel-data-in-gauss-video/" target="_blank" rel="noopener">Getting Started with Panel Data in GAUSS</a></li>
<li><a href="https://www.aptech.com/blog/panel-data-structural-breaks-and-unit-root-testing/" target="_blank" rel="noopener">Panel data, structural breaks and unit root testing</a></li>
<li><a href="https://www.aptech.com/blog/how-to-aggregate-panel-data-in-gauss/" target="_blank" rel="noopener">How to Aggregate Panel Data in GAUSS</a></li>
<li><a href="https://www.aptech.com/blog/introduction-to-the-fundamentals-of-panel-data/" target="_blank" rel="noopener">Introduction to the Fundamentals of Panel Data</a></li>
<li><a href="https://www.aptech.com/blog/panel-data-stationarity-test-with-structural-breaks/" target="_blank" rel="noopener">Panel Data Stationarity Test With Structural Breaks</a></li>
<li><a href="https://www.aptech.com/blog/transforming-panel-data-to-long-form-in-gauss/" target="_blank" rel="noopener">Transforming Panel Data to Long Form in GAUSS</a></li>
</ol>
<h3 id="references">References</h3>
<p><a href="https://link.springer.com/book/10.1007/978-3-030-53953-5" target="_blank" rel="noopener">Baltagi, B.</a>(2021). <i>Econometric analysis of panel data.</i> Springer.</p>
<p>
    <!-- MathJax configuration -->
    <style>
        .mjx-svg-href {
            fill: "inherit" !important;
            stroke: "inherit" !important;
        }
    </style>
    <script type="text/x-mathjax-config">
        MathJax.Hub.Config({ TeX: { equationNumbers: {autoNumber: "AMS"} } });
    </script>
    <script type="text/javascript">
window.MathJax = {
  tex2jax: {
    inlineMath: [ ['$','$'] ],
    displayMath: [ ['$$','$$'] ],
    processEscapes: true,
    processEnvironments: true
  },
  // Center justify equations in code and markdown cells. Elsewhere
  // we use CSS to left justify single line equations in code cells.
  displayAlign: 'center',
  "HTML-CSS": {
    styles: {'.MathJax_Display': {"margin": 0}},
    linebreaks: { automatic: false }
  },
  "SVG": {
    styles: {'.MathJax_SVG_Display': {"margin": 0}},
    linebreaks: { automatic: false }
  },
  showProcessingMessages: false,
  messageStyle: "none",
  menuSettings: { zoom: "Click" },
  AuthorInit: function() {
    MathJax.Hub.Register.StartupHook("End", function() {
            var timeout = false, // holder for timeout id
            delay = 250; // delay after event is "complete" to run callback
            var shrinkMath = function() {
              //var dispFormulas = document.getElementsByClassName("formula");
              var dispFormulas = document.getElementsByClassName("MathJax_SVG_Display");
              if (dispFormulas){
                // caculate relative size of indentation
                var contentTest = document.getElementsByTagName("body")[0];
                var nodesWidth = contentTest.offsetWidth;
                // if you have indentation
                var mathIndent = MathJax.Hub.config.displayIndent; //assuming px's
                var mathIndentValue = mathIndent.substring(0,mathIndent.length - 2);
                for (var i=0; i<dispFormulas.length; i++){
                  var dispFormula = dispFormulas[i];
                  var wrapper = dispFormula;
                  //var wrapper = dispFormula.getElementsByClassName("MathJax_Preview")[0].nextSibling;
                  var child = wrapper.firstChild;
                  wrapper.style.transformOrigin = "center"; //or top-left if you left-align your equations
                  var oldScale = child.style.transform;
                  //var newValue = Math.min(0.80*dispFormula.offsetWidth / child.offsetWidth,1.0).toFixed(2);
                  var newValue = Math.min(dispFormula.offsetWidth / child.offsetWidth,1.0).toFixed(2);
                  var newScale = "scale(" + newValue + ")";
                  if(newValue != "NaN" && !(newScale === oldScale)){
                    wrapper.style.transform = newScale;
                    wrapper.style["margin-left"]= Math.pow(newValue,4)*mathIndentValue + "px";
                    var wrapperStyle = window.getComputedStyle(wrapper);
                    var wrapperHeight = parseFloat(wrapperStyle.height);
                    wrapper.style.height = "" + (wrapperHeight * newValue) + "px";
                    if(newValue === "1.00"){
                      wrapper.style.cursor = "";
                      wrapper.style.height = "";
                    }
                    else {
                      wrapper.style.cursor = "zoom-in";
                    }
                  }

                }
            }
            };
            shrinkMath();
            window.addEventListener('resize', function() {
              clearTimeout(timeout);
              timeout = setTimeout(shrinkMath, delay);
            });
          });
  }
}
</script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.7/MathJax.js?config=TeX-AMS_SVG"></script></p>]]></content:encoded>
					
					<wfw:commentRss>https://www.aptech.com/blog/panel-data-basics-one-way-individual-effects/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
		<item>
		<title>Introduction to Difference-in-Differences Estimation</title>
		<link>https://www.aptech.com/blog/introduction-to-difference-in-differences-estimation/</link>
					<comments>https://www.aptech.com/blog/introduction-to-difference-in-differences-estimation/#comments</comments>
		
		<dc:creator><![CDATA[Eric]]></dc:creator>
		<pubDate>Sat, 30 Mar 2019 13:52:33 +0000</pubDate>
				<category><![CDATA[Econometrics]]></category>
		<category><![CDATA[Panel data]]></category>
		<guid isPermaLink="false">https://www.aptech.com/?p=19813</guid>

					<description><![CDATA[When policy changes or treatments are imposed on people, it is common and reasonable to ask how those people have been impacted. This is a more difficult question than it seems at first glance. In today's blog, we examine difference-in-differences (DD) estimation, a common tool for considering the impact of treatments on individuals.]]></description>
										<content:encoded><![CDATA[<h3 id="introduction">Introduction</h3>
<p>When policy changes or treatments are imposed on people, it is common and reasonable to ask how those people have been impacted. This is a more difficult question than it seems at first glance. </p>
<p>In order to truly know how those individuals have been impacted, we need to consider how those individuals would be had the policies or treatments not taken place. However, the changes did take place, and we don't get to observe how those individuals would fare without those changes. </p>
<p>In today's blog, we examine difference-in-differences (DD) estimation, a common tool for considering the impact of treatments on individuals. We will consider:</p>
<ul>
<li>What is difference-in-differences (DD) estimation?</li>
<li>How does DD work? </li>
<li>A simple DD example.</li>
</ul>
<h2 id="what-is-difference-in-differences-estimation">What is difference-in-differences estimation</h2>
<p><a href="https://www.aptech.com/wp-content/uploads/2019/03/gblog-difference-in-differences-march-2019.png"><img src="https://www.aptech.com/wp-content/uploads/2019/03/gblog-difference-in-differences-march-2019.png" alt="Difference-in-difference method plot." width="600" height="400" class="aligncenter size-full wp-image-19915" /></a></p>
<p>Difference-in-differences estimation attempts to measure the effects of a sudden change in the economic environment, policy, or general treatment on a group of individuals. </p>
<p>The DD model includes several pieces:</p>
<ul>
<li>A sudden <b>exogenous source of variation</b>, which we will refer to as the treatment. Treatment examples include changes in <a href="http://davidcard.berkeley.edu/papers/njmin-aer.pdf">minimum wage</a>, a new <a href="https://scholarship.law.duke.edu/cgi/viewcontent.cgi?article=3007&amp;context=faculty_scholarship">workplace non-discrimination policy</a>, or a new <a href="https://www.sciencedirect.com/science/article/pii/S0301421511004502">CO<sub>2</sub> emissions tax</a>. </li>
<li>A quantifiable and measurable <b>outcome</b> which is either the direct target of the variation or an indirect proxy. </li>
<li>A <b>treatment group</b> which is subjected to the change.</li>
<li>A <b>control group</b> which is similar in characteristic to the treatment group but is not subjected to the change. </li>
</ul>
<p>DD uses the outcome of the control group as a proxy for what would have occurred in the treatment group had there been no treatment. The difference in the average post-treatment outcomes between the treatment and control groups is then used to measure the <b>treatment effects</b>. </p>
<h3 id="example-case">Example case</h3>
<p>Let's consider a simple example. Suppose we have two professors of introductory econometrics classes, one at Transylvania University (TU) and one at The University of Azkaban (UA). Both professors have decided to <a href="https://www.aptech.com/industry-solutions/gauss-in-education/gauss-in-the-classroom/">use GAUSS to teach</a> a year-long series of econometrics courses.  </p>
<p>A quarter through the year, the class at TU takes advantage of a free GAUSS training session while the class at UA does not. </p>
<p>We can compare the grades on the GAUSS homework assignments by the students at each university before and after the training date to measure the benefit of the training session. </p>
<table style="width: 100%">
<tr>
<th>Treatment</th><td>Aptech GAUSS training course</td>
</tr>
<tr>
<th>Control Group</th><td>Students using GAUSS at University of Azkaban</td>
</tr>
<tr>
<th>Treatment Group</th><td>Students using GAUSS at Transylvania University</td>
</tr>
<tr>
<th>Outcome</th><td>Grades on GAUSS homework assignments</td>
</tr>
</table>
<h2 id="how-does-it-work">How does it work?</h2>
<p>The DD estimate uses the between-group cross-sectional differences and within-group time-series differences to measure treatment effects. Estimated separately, both the cross-sectional differences and within-group time-series differences may produce biased estimates of the treatment effects. </p>
<h3 id="dd-model-outline">DD Model Outline</h3>
<p>Let's look more formally at our example to better understand how DD works. First, we define our two outcomes</p>
<p>$$Y_{1,i,c,t} = \text{homework grades by student } i \text{, in}\\ \text{ class } c \text{, in period } t \text{ with training course} $$
$$Y_{0,i,c,t} = \text{homework grades by student } i \text{, in}\\ \text{ class } c \text{, in period } t \text{ without training course} $$</p>
<p>where <i>i</i> is an individual student, <i>c</i> is the class and <i>t</i> is the time period. </p>
<div class="alert alert-info" role="alert"><strong>Note:</strong> These are just theoretical outcomes -- empirically we only get to observe one or the other. For example, once the TU students take the course we cannot observe their homework grades without the course in the time periods after the course.</div>
<p>We begin by assuming that potential outcomes before training are determined by a time-invariant, university-specific effect, $\gamma_c$, and a university-invariant, time-specific effect, $\lambda_t$:</p>
<p>$$ E(Y _{0,i,c,t}| c, t) = \gamma_c + \lambda_t $$</p>
<p>Note that the time-invariant component, $\gamma_c$, depends only on the university that the student is in and is independent of the time period. Similarly, the university-invariant component, $\lambda_t$, is independent of the university and changes only with the time period. </p>
<p>Assuming that the training has a constant effect, $\beta$, on homework grades:</p>
<p>$$ E(Y _{1,i,c,t}| c, t) = \gamma_c + \lambda_t + \beta .$$</p>
<p>More generally we can express our outcomes as </p>
<p>$$Y _{i,c,t} = \gamma_c + \lambda_t + \beta D_{c,t} + \epsilon_{i,c,t}$$</p>
<p>where $D_{ct}$ is a dummy variable representing classes that have received training and $E(\epsilon_{ict}| c, t) = 0$.</p>
<p>Using this we compare the differences in outcomes for the individual classes across time:</p>
<p>$$E(Y _{i,c,t} | TU, \text{post-course}) - E(Y _{i,c,t} | TU, \text{pre-course}) =\\ \lambda_{post-course} + \beta - \lambda_{pre-course}$$</p>
<p>and </p>
<p>$$E(Y _{i,c,t} | UA, \text{post-course}) - E(Y _{i,c,t} | UA, \text{pre-course}) =\\ \lambda_{post-course} - \lambda_{pre-course} .$$</p>
<p>From here were are able to estimate the population difference-in-differences which measures the treatment effect of interest </p>
<p>$$ [E(Y _{i,c,t} | TU, \text{post-course}) - E(Y _{i,c,t} | TU, \text{pre-course})] -\\ [E(Y _{i,c,t} | UA, \text{post-course}) - E(Y _{i,c,t} | UA, \text{pre-course})] = \beta .$$</p>
<p>This is the key outcome of the difference-in-differences method.  We have eliminated the common trend between the groups, $\lambda_t$, and the permanent differences between the groups, leaving a very simple estimate of the treatment effect, $\beta$.</p>
<h3 id="dd-assumptions">DD Assumptions</h3>
<p>The DD estimation of treatment effects is an appealingly simple way to measure treatment effects. However, it relies on some key assumptions (Angrist and Pischke, 2008):</p>
<ul>
<li>Outcomes in the treatment group and the control group follow the same trend, $\lambda_t$.  </li>
<li>The treatment causes deviation, $\beta$, from the trend.</li>
<li>The differences in the treatment group and control group are captured by the fixed effects variables. $\gamma_c$.</li>
</ul>
<h2 id="example">Example</h2>
<p><a href="https://www.aptech.com/wp-content/uploads/2019/03/gblog-difference-in-differences-class-example-2.png"><img src="https://www.aptech.com/wp-content/uploads/2019/03/gblog-difference-in-differences-class-example-2.png" alt="Plot of difference-in-differences estimation example." width="600" height="360" class="aligncenter size-full wp-image-20003" /></a></p>
<p>Let's look again at our econometrics students from TU and UA. The students earn the following grades on their assignments:</p>
<table style="width: 100%">
    <colgroup>
       <col span="1" style="width: 25%;">
       <col span="1" style="width: 37.5%;">
       <col span="1" style="width: 37.5%;">
    </colgroup>
<tr>
<th></th><th style="text-align:center">Pre-treatment<br> period</th><th style="text-align:center">Post-treatment<br> period</th>
</tr>
 <tr>
<th>Transylvania<br> University</th><td style="text-align:center">77, 82, 65, 68, 90,<br> 84, 67, 73, 84, 61</td><td style="text-align:center">76, 88, 73, 74, 94,<br> 88, 69, 78, 89, 71</td>
</tr>
 <tr>
<th>University<br> of Azkaban</th><td style="text-align:center">74, 63, 82, 70, 92,<br> 67, 66, 68, 87, 95</td><td style="text-align:center">72, 70, 84, 67, 92,<br> 70, 65, 65, 82, 96</td>
</tr>
</table>
<p>Now consider the averages and their differences:</p>
<table style="width: 100%">
    <colgroup>
       <col span="1" style="width: 25%;">
       <col span="1" style="width: 25%;">
       <col span="1" style="width: 25%;">
       <col span="1" style="width: 25%;">
    </colgroup>
<tr>
<th></th><th style="text-align:center">Pre-treatment<br> period</th><th style="text-align:center">Post-treatment<br> period</th><th style="text-align:center">Differences</th>
</tr> 
 <tr>
<th>Transylvania<br> University</th><td style="text-align:center">75.100<br>(9.235)</td><td style="text-align:center">80.000<br>(8.894)</td><td style="text-align:center; background-color: #fde5d2" ">4.900<br>(3.035)</td>
</tr>
 <tr>
<th>University<br> of Azkaban</th><td style="text-align:center">76.400<br>(11.673)</td><td style="text-align:center">76.300<br>(11.383)</td><td style="text-align:center; background-color: #fde5d2" ">-0.100<br>(3.510)</td>
</tr>
 <tr>
<th>Differences</th><td style="text-align:center; background-color: #fde5d2">-1.300<br>(15.384)</td><td style="text-align:center; background-color: #fde5d2" ">3.700<br>(13.166)</td><td style="background-color: #fbcaa5; text-align:center"><b>5.000<br>(3.944)</b></td>
</tr>
</table>
<p>The orange highlighted values represent the difference-in-differences across periods between the TU class, the treatment group, and the UA class, the control group. We see that the training class provides a treatment effect of an average 5.00% increase in grades.  </p>
<p>The GAUSS code to replicate these results is available on the <a href="https://github.com/aptech/gauss_blog/tree/master/econometrics/did-3.28.2019">Aptech GitHub repository</a>.</p>
<h3 id="conclusion">Conclusion</h3>
<p>The difference-in-differences method provides a simple method for estimating treatment effects. The basic two-period approach outlined here provides a foundation for more sophisticated techniques including larger panel regression DD. </p>
<p>In today's blog we have covered the fundamentals of the DD method:</p>
<ul>
<li>What is difference-in-differences (DD) estimation</li>
<li>How does DD work? </li>
<li>A simple DD example</li>
</ul>
<p>The code and data for this blog can be found at our Aptech Blog Github <a href="https://github.com/aptech/gauss_blog/tree/master/econometrics/did-3.28.2019">code repository</a>.</p>
<h2 id="references">References</h2>
<p><a href="https://www.researchgate.net/publication/51992844_Mostly_Harmless_Econometrics_An_Empiricist's_Companion">Angrist, Joshua D., and Jörn-Steffen Pischke, 2008</a>, “Mostly harmless econometrics: An empiricist's companion,” <em>Princeton university press</em>.</p>
<p>
    <!-- MathJax configuration -->
    <style>
        .mjx-svg-href {
            fill: "inherit" !important;
            stroke: "inherit" !important;
        }
    </style>
    <script type="text/x-mathjax-config">
        MathJax.Hub.Config({ TeX: { equationNumbers: {autoNumber: "AMS"} } });
    </script>
    <script type="text/javascript">
window.MathJax = {
  tex2jax: {
    inlineMath: [ ['$','$'] ],
    displayMath: [ ['$$','$$'] ],
    processEscapes: true,
    processEnvironments: true
  },
  // Center justify equations in code and markdown cells. Elsewhere
  // we use CSS to left justify single line equations in code cells.
  displayAlign: 'center',
  "HTML-CSS": {
    styles: {'.MathJax_Display': {"margin": 0}},
    linebreaks: { automatic: false }
  },
  "SVG": {
    styles: {'.MathJax_SVG_Display': {"margin": 0}},
    linebreaks: { automatic: false }
  },
  showProcessingMessages: false,
  messageStyle: "none",
  menuSettings: { zoom: "Click" },
  AuthorInit: function() {
    MathJax.Hub.Register.StartupHook("End", function() {
            var timeout = false, // holder for timeout id
            delay = 250; // delay after event is "complete" to run callback
            var shrinkMath = function() {
              //var dispFormulas = document.getElementsByClassName("formula");
              var dispFormulas = document.getElementsByClassName("MathJax_SVG_Display");
              if (dispFormulas){
                // caculate relative size of indentation
                var contentTest = document.getElementsByTagName("body")[0];
                var nodesWidth = contentTest.offsetWidth;
                // if you have indentation
                var mathIndent = MathJax.Hub.config.displayIndent; //assuming px's
                var mathIndentValue = mathIndent.substring(0,mathIndent.length - 2);
                for (var i=0; i<dispFormulas.length; i++){
                  var dispFormula = dispFormulas[i];
                  var wrapper = dispFormula;
                  //var wrapper = dispFormula.getElementsByClassName("MathJax_Preview")[0].nextSibling;
                  var child = wrapper.firstChild;
                  wrapper.style.transformOrigin = "center"; //or top-left if you left-align your equations
                  var oldScale = child.style.transform;
                  //var newValue = Math.min(0.80*dispFormula.offsetWidth / child.offsetWidth,1.0).toFixed(2);
                  var newValue = Math.min(dispFormula.offsetWidth / child.offsetWidth,1.0).toFixed(2);
                  var newScale = "scale(" + newValue + ")";
                  if(newValue != "NaN" && !(newScale === oldScale)){
                    wrapper.style.transform = newScale;
                    wrapper.style["margin-left"]= Math.pow(newValue,4)*mathIndentValue + "px";
                    var wrapperStyle = window.getComputedStyle(wrapper);
                    var wrapperHeight = parseFloat(wrapperStyle.height);
                    wrapper.style.height = "" + (wrapperHeight * newValue) + "px";
                    if(newValue === "1.00"){
                      wrapper.style.cursor = "";
                      wrapper.style.height = "";
                    }
                    else {
                      wrapper.style.cursor = "zoom-in";
                    }
                  }

                }
            }
            };
            shrinkMath();
            window.addEventListener('resize', function() {
              clearTimeout(timeout);
              timeout = setTimeout(shrinkMath, delay);
            });
          });
  }
}
</script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.7/MathJax.js?config=TeX-AMS_SVG"></script></p>]]></content:encoded>
					
					<wfw:commentRss>https://www.aptech.com/blog/introduction-to-difference-in-differences-estimation/feed/</wfw:commentRss>
			<slash:comments>4</slash:comments>
		
		
			</item>
		<item>
		<title>Panel data, structural breaks and unit root testing</title>
		<link>https://www.aptech.com/blog/panel-data-structural-breaks-and-unit-root-testing/</link>
					<comments>https://www.aptech.com/blog/panel-data-structural-breaks-and-unit-root-testing/#comments</comments>
		
		<dc:creator><![CDATA[Eric]]></dc:creator>
		<pubDate>Sat, 23 Feb 2019 08:35:14 +0000</pubDate>
				<category><![CDATA[Econometrics]]></category>
		<category><![CDATA[Panel data]]></category>
		<category><![CDATA[panel data]]></category>
		<category><![CDATA[unit root]]></category>
		<guid isPermaLink="false">https://www.aptech.com/?p=19541</guid>

					<description><![CDATA[In this blog, we extend  our <a href="https://www.aptech.com/blog/unit-root-tests-with-structural-breaks/">analysis of unit root testing</a> with <a href="https://www.aptech.com/structural-breaks/">structural breaks</a> to <a href="https://www.aptech.com/blog/introduction-to-the-fundamentals-of-panel-data/">panel data</a>. Using panel data unit roots tests found in the GAUSS <a href="https://github.com/aptech/tspdlib">tspdlib</a> we consider if a panel of international current account balances collectively shows unit root behavior.
]]></description>
										<content:encoded><![CDATA[<p><img src="https://www.aptech.com/wp-content/uploads/2019/02/gblog-sb-02202018-1.png" alt="US Current Account Balance" /></p>
<h3 id="introduction">Introduction</h3>
<p>In this blog, we extend <a href="https://www.aptech.com/blog/unit-root-tests-with-structural-breaks/" target="_blank" rel="noopener">last week's</1> analysis of unit root testing with <a href="https://www.aptech.com/structural-breaks/" target="_blank" rel="noopener">structural breaks</a> to <a href="https://www.aptech.com/blog/introduction-to-the-fundamentals-of-panel-data/" target="_blank" rel="noopener">panel data</a>.</p>
<p>We will again use the quarterly current account to GDP ratio but focus on a panel of data from five countries:  United States, United Kingdom, Australia, South Africa, and India. </p>
<p>Using panel data unit roots tests found in the <a href="https://docs.aptech.com/gauss/tspdlib/docs/tspdlib-landing.html" target="_blank" rel="noopener">GAUSS tspdlib library</a> we consider if the panel collectively shows unit root behavior.</p>
<h2 id="testing-for-unit-roots-in-panel-data">Testing for unit roots in panel data</h2>
<h3 id="why-panel-data">Why panel data</h3>
<p>There are a number of reasons we utilize <a href="https://www.aptech.com/blog/introduction-to-the-fundamentals-of-panel-data/" target="_blank" rel="noopener">panel data</a> in econometrics (Baltagi, 2008). Panel data:</p>
<ul>
<li>Capture the idiosyncratic behaviors of individual groups with models like the fixed effects or random effects models.</li>
<li>Contain more information, more variability, and more efficiency.</li>
<li>Can detect and measure statistical effects that pure time-series or cross-section data can't. </li>
<li>Provide longer time-series for unit-root testing, which in turn leads to standard asymptotic behavior.</li>
</ul>
<h3 id="panel-data-unit-root-testing">Panel data unit root testing</h3>
<p>Today we will test for unit roots using the panel Lagrangian Multiplier (LM) unit-root test with structural breaks in the mean (Im, K., Lee, J., Tieslau, M., 2005):</p>
<ul>
<li>The panel LM test statistic averages the individual LM test statistics which are computed using the pooled likelihood function. </li>
<li>The asymptotic distribution of the test is robust to structural breaks. </li>
<li>The test considers the null unit root hypothesis against the alternative that at least one time series in the panel is stationary. </li>
</ul>
<h2 id="testing-our-panel">Testing our panel</h2>
<h3 id="setting-up-the-test">Setting up the test</h3>
<p>The panel LM test can be run using the <strong>GAUSS</strong> <a href="https://docs.aptech.com/gauss/tspdlib/docs/pdlm.html" target="_blank" rel="noopener">PDLM</a> procedure found in the GAUSS <strong>tspdlib</strong> library. The procedure has two required inputs and four additional optional arguments:</p>
<hr />
<dl>
<dt>y_test</dt>
<dd>T x N matrix,  the panel data to be tested.</dd>
<dt>model</dt>
<dd>Scalar,  indicates the type of model to be tested.<br>    1 = break in level.<br>    2 = break in level and trend.</dd>
<dt>nbreak</dt>
<dd>Scalar,  optional input, the number of breaks to allow. <br>    1 = one break.<br>    2 = two breaks. Default = 0.</dd>
<dt>pmax</dt>
<dd>Scalar,  optional input, maximum number of lags for Dy. 0 = no lags. Default = 8.</dd>
<dt>ic</dt>
<dd>Scalar,  optional input, the information criterion used to select lags. <br>    1 = Akaike. <br>    2 = Schwarz. <br>    3 = t-stat significance. Default = 3.</dd>
<dt>trimm</dt>
<dd>Scalar,  optional input, data trimming rate. Default = 0.10
<hr /></dd>
</dl>
<p>The <code>PDLM</code> procedure has five returns:</p>
<hr />
<dl>
<dt>Nlm</dt>
<dd>Vector,  the minimum test statistic for each cross-section.</dd>
<dt>Ntb</dt>
<dd>Vector,  location of break(s) for each cross-section.</dd>
<dt>Np</dt>
<dd>Scalar,  number of lags selected by chosen information criterion for each cross-section.</dd>
<dt>PDlm</dt>
<dd>Scalar,  panel LM statistic with N(0, 1).</dd>
<dt>pval</dt>
<dd>Scalar,  p-value of <em>PDlm</em>.</dd>
</dl>
<hr />
<h3 id="running-the-test">Running the test</h3>
<p>The test is easy to set up and run in GAUSS. We first load the <strong>tspdlib</strong> library and our data. </p>
<pre class="hljs-container hljs-container-solo"><code class="lang-gauss">library tspdlib;

// Load data
ca_panel = loadd("panel_ca.dat");
y_test = ca_panel[., 2:cols(ca_panel)];
</code></pre>
<p>Next, we specify that we want to run the model with level breaks and we call the <code>PDLM</code> procedure separately for the one break and two break models. We will keep all other parameters at their default values:</p>
<pre class="hljs-container hljs-container-solo"><code class="lang-gauss">// Specify to run model with 
// level breaks
model = 1;

// Run first with one break
nbreak = 1;

// Call PD LM with one level break
{ Nlm, Ntb, Np, PDlm, pval } = PDLM(y_test, model, nbreak);

// Run next with two breaks
nbreak = 2;

// Call PD LM with level break
{ Nlm, Ntb, Np, PDlm, pval } = PDLM(y_test, model, nbreak);</code></pre>
<h3 id="results">Results</h3>
<table style="border-collapse: collapse">
<tr>
<th>Country</th><th>Cross-section<br> test statistic</th><th>Break<br> location</th><th>Number of<br> lags</th><th>Conclusion</th>
</tr>
<tr><th colspan="5">Two break model</th></tr>
<tr>
<td>United States</td><td>-3.3067</td><td>1993 Q1, 2004 Q3</td><td>12</td><td>Reject the null</td>
</tr>
<tr>
<td>United Kingdom</td><td>-4.6080</td><td>1980 Q4, 1984 Q4</td><td>4</td><td>Reject the null</td>
</tr>
<tr>
<td>Australia</td><td>-3.9522</td><td>1970 Q3, 1977 Q4</td><td>12</td><td>Reject the null</td>
</tr>
<tr>
<td>South Africa</td><td>-5.6735</td><td>1976 Q4, 1983 Q4</td><td>4</td><td>Reject the null</td>
</tr>
<tr style="border-bottom: 2px solid #444;">
<td>India</td><td>-5.6734</td><td>1975 Q4, 2004 Q2</td><td>9&lt;</td><td>Reject the null</td>
</tr>
<tr style="border-top: 1px inset #444;"><td>Full Panel</td><td>-6.6339526</td><td>N/A</td><td>N/A</td><td>Reject the null</td>
</tr>
<tr><td colspan="5"> </td></tr>
<tr><th colspan="5">One break model</th></tr>

<tr>
<td>United States</td><td>-3.0504</td><td>1993 Q1</td><td>12&lt;</td><td>Reject the null</td>
</tr>
<tr>
<td>United Kingdom</td><td>-4.1213</td><td>1984 Q4</td><td>4</td><td>Reject the null</td>
</tr>
<tr>
<td>Australia</td><td>-3.1625</td><td>1980 Q2</td><td>12</td><td>Reject the null</td>
</tr>
<tr>
<td>South Africa</td><td>-5.1271</td><td>1979 Q4</td><td>4</td><td>Reject the null</td>
</tr>
<tr>
<td>India</td><td>-2.8001</td><td>1976 Q2</td><td>9</td><td>Reject the null</td>
</tr>
<tr style="border-top: 2px solid #444;"><td>Full Panel</td><td>-8.9118730</td><td>N/A</td><td>N/A</td><td>Reject the null</td>
</tr>
</table>
<p>Research on the presence of unit roots in current account balances has had mixed results. These results bring to the forefront the question of current account balance sustainability (Clower &amp; Ito, 2012). </p>
<p>Our panel tests with structural breaks unanimously reject the null hypothesis of unit roots for all cross-sections, as well as the combined panel. This adds support, at least for our small sample, to the idea that current account balances are sustainable and mean-reverting. </p>
<h2 id="conclusions">Conclusions</h2>
<p>Today we've learned about conducting panel data unit root testing in the presence of structural breaks using the LM test from  <a href="https://onlinelibrary.wiley.com/doi/abs/10.1111/j.1468-0084.2005.00125.x">(Im, K., Lee, J., Tieslau, M., 2005)</a>. After today you should have  a better understanding of:</p>
<ol>
<li>Some of the advantages of using panel-data.</li>
<li>How to test for unit roots in panel data using the LM test with structural breaks.</li>
<li>How to use the <a href="https://github.com/aptech/tspdlib">GAUSS tspdlib library</a> to test for unit roots with structural breaks.</li>
</ol>
<p>Code and data from this blog can be found <a href="https://github.com/aptech/gauss_blog/tree/master/time_series/panel-unitroot-2.22.19" target="_blank" rel="noopener">here</a>.</p>
<h3 id="further-reading">Further Reading</h3>
<ol>
<li><a href="https://www.aptech.com/blog/panel-data-basics-one-way-individual-effects/" target="_blank" rel="noopener">Panel Data Basics: One-way Individual Effects</a></li>
<li><a href="https://www.aptech.com/blog/how-to-aggregate-panel-data-in-gauss/" target="_blank" rel="noopener">How to Aggregate Panel Data in GAUSS</a></li>
<li><a href="https://www.aptech.com/blog/introduction-to-the-fundamentals-of-panel-data/" target="_blank" rel="noopener">Introduction to the Fundamentals of Panel Data</a></li>
<li><a href="https://www.aptech.com/blog/panel-data-stationarity-test-with-structural-breaks/" target="_blank" rel="noopener">Panel Data Stationarity Test With Structural Breaks</a></li>
<li><a href="https://www.aptech.com/blog/transforming-panel-data-to-long-form-in-gauss/" target="_blank" rel="noopener">Transforming Panel Data to Long Form in GAUSS</a></li>
</ol>
<h3 id="references">References</h3>
<p>Baltagi, B. (2008). <em>Econometric analysis of panel data</em>. John Wiley &amp; Sons.</p>
<p>Clower, E., &amp; Ito, H. (2012). The persistence of current account balances and its determinants: the implications for global rebalancing.</p>
<p>Im, K., Lee, J., Tieslau, M. (2005). Panel LM Unit-root Tests with Level Shifts. <em>Oxford Bulletin of Economics and Statistics</em> 67, 393–419.</p>]]></content:encoded>
					
					<wfw:commentRss>https://www.aptech.com/blog/panel-data-structural-breaks-and-unit-root-testing/feed/</wfw:commentRss>
			<slash:comments>9</slash:comments>
		
		
			</item>
	</channel>
</rss>
