<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Programming &#8211; Aptech</title>
	<atom:link href="https://www.aptech.com/blog/category/programming/feed/" rel="self" type="application/rss+xml" />
	<link>https://www.aptech.com</link>
	<description>GAUSS Software - Fastest Platform for Data Analytics</description>
	<lastBuildDate>Wed, 08 Apr 2026 17:56:17 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	
	<item>
		<title>MLE with Bounded Parameters: A Cleaner Approach</title>
		<link>https://www.aptech.com/blog/mle-with-bounded-parameters-a-cleaner-approach/</link>
					<comments>https://www.aptech.com/blog/mle-with-bounded-parameters-a-cleaner-approach/#respond</comments>
		
		<dc:creator><![CDATA[admin]]></dc:creator>
		<pubDate>Wed, 08 Apr 2026 17:56:17 +0000</pubDate>
				<category><![CDATA[Econometrics]]></category>
		<category><![CDATA[Programming]]></category>
		<guid isPermaLink="false">https://www.aptech.com/?p=11585713</guid>

					<description><![CDATA[]]></description>
										<content:encoded><![CDATA[<h2 id="introduction">Introduction</h2>
<p>It's natural in data analysis applications for parameters to have bounds; variances can't be negative, GARCH coefficients must sum to less than one for stationarity, and mixing proportions live between zero and one. </p>
<p>When you estimate these models by maximum likelihood, the optimizer needs to respect those bounds, not just at the solution, but throughout the search. If optimization searches wander into invalid territory, it can impact the reliability and convergence of your results. For example, you may get complex numbers from negative variances, explosive forecasts from non-stationary GARCH, or likelihoods that make no sense.</p>
<p><a href="https://www.aptech.com/blog/gauss26/" target="_blank" rel="noopener">GAUSS 26.0.1</a> introduces <a href="https://docs.aptech.com/gauss/minimize.html#minimize" target="_blank" rel="noopener"><code>minimize</code></a>, the first new GAUSS optimizer in over 10 years, to handle this cleanly. </p>
<p>The <code>minimize</code> optmizer let's you specify bounds directly and GAUSS internally keeps parameters feasible at every iteration. No more log-transforms, no penalty functions, and no doublechecking.</p>
<p>In today's blog, we'll see the new <code>minimize</code> function in action, as we walk through two examples: </p>
<ul>
<li>A GARCH estimation where variance parameters must be positive</li>
<li>A Stochastic frontier models where both variance components must be positive. </li>
</ul>
<p>In both cases, bounded optimization makes estimation easier and aligns results with theory.</p>
<h2 id="why-bounds-matter">Why Bounds Matter</h2>
<p>To see why this matters in practice, let’s look at a familiar example. Consider a GARCH(1,1) model:</p>
<p>$\sigma^2_t = \omega + \alpha \varepsilon^2_{t-1} + \beta \sigma^2_{t-1}$</p>
<p>For this model to be well-defined and economically meaningful:</p>
<ul>
<li>The baseline variance must be positive ($\omega \gt 0$)</li>
<li>Shocks and persistence must contribute non-negatively to variance ($\alpha \geq 0$, $\beta \geq 0$)</li>
<li>The model must be stationary ($\alpha + \beta \lt 1$)</li>
</ul>
<p>The traditional workaround is to estimate transformed parameters, $\log(\omega)$ instead of $\omega$, then convert back. This works, but it distorts the optimization surface and complicates standard error calculations. You're not estimating the parameters you care about; you're estimating transforms and hoping the numerics work out.</p>
<p>With bounded optimization, you estimate $\omega$, $\alpha$, and $\beta$ directly, with the optimizer respecting the constraints throughout.</p>
<h2 id="example-1-garch11-on-commodity-returns">Example 1: GARCH(1,1) on Commodity Returns</h2>
<p>Let's estimate a GARCH(1,1) model on a dataset of 248 observations of commodity price returns (this data is included in the GAUSS 26 examples directory). </p>
<h3 id="step-one-data-and-likelihood">Step One: Data and Likelihood</h3>
<p>First, we load the data and specify our log-likelihood objective function. </p>
<pre class="hljs-container hljs-container-solo"><code class="lang-gauss">// Load returns data (ships with GAUSS)
fname = getGAUSShome("examples/df_returns.gdat");
returns = loadd(fname, "rcpi");

// GARCH(1,1) negative log-likelihood
proc (1) = garch_negll(theta, y);
    local omega, alpha, beta_, sigma2, ll, t;

    omega = theta[1];
    alpha = theta[2];
    beta_ = theta[3];

    sigma2 = zeros(rows(y), 1);

    // Initialize with sample variance
    sigma2[1] = stdc(y)^2;

    // Variance recursion
    for t (2, rows(y), 1);
        sigma2[t] = omega + alpha * y[t-1]^2 + beta_ * sigma2[t-1];
    endfor;

    // Gaussian log-likelihood
    ll = -0.5 * sumc(ln(2*pi) + ln(sigma2) + (y.^2) ./ sigma2);

    retp(-ll);  // Return negative for minimization
endp;</code></pre>
<h3 id="step-two-setting-up-optimization">Step Two: Setting Up Optimization</h3>
<p>Now we set up the bounded optimization with:</p>
<ul>
<li>$\omega \gt 0$ (small positive lower bound to avoid numerical issues)</li>
<li>$\alpha \geq 0$</li>
<li>$\beta \geq 0$</li>
</ul>
<p>Because <code>minimize</code> handles simple box constraints, we impose individual upper bounds on $\alpha$ and $\beta$ to keep the optimizer in a reasonable region. We'll verify the stationarity condition, $\alpha + \beta \lt 1$ after estimation. </p>
<pre class="hljs-container hljs-container-solo"><code class="lang-gauss">// Starting values
theta0 = { 0.00001,   // omega (small, let data speak)
           0.05,      // alpha
           0.90 };    // beta

// Set up minimize
struct minimizeControl ctl;
ctl = minimizeControlCreate();

// Bounds: all parameters positive, alpha + beta &lt; 1
ctl.bounds = { 1e-10      1,      // omega in [1e-10, 1]
               0          1,      // alpha in [0, 1]
               0     0.9999 };    // beta in [0, 0.9999]</code></pre>
<div class="alert alert-info" role="alert">We cap $\beta$ slightly below 1 to avoid numerical issues near the boundary, where the likelihood surface can become flat and unstable.</div>
<h3 id="step-three-running-the-model">Step Three: Running the Model</h3>
<p>Finally, we call <code>minimize</code> to run our model.</p>
<pre class="hljs-container hljs-container-solo"><code class="lang-gauss">// Estimate
struct minimizeOut out;
out = minimize(&amp;garch_negll, theta0, returns, ctl);</code></pre>
<h3 id="results-and-visualization">Results and Visualization</h3>
<p>After estimation, we'll extract the conditional variance series and confirm the stationarity condition: </p>
<pre class="hljs-container hljs-container-solo"><code class="lang-gauss">// Extract estimates
omega_hat = out.x[1];
alpha_hat = out.x[2];
beta_hat = out.x[3];

print "omega = " omega_hat;
print "alpha = " alpha_hat;
print "beta  = " beta_hat;
print "alpha + beta = " alpha_hat + beta_hat;
print "Iterations: " out.iterations;</code></pre>
<p>Output:</p>
<pre>omega = 0.0000070
alpha = 0.380
beta  = 0.588

alpha + beta = 0.968
Iterations: 39</pre>
<p>There are a few noteworthy results:</p>
<ol>
<li>The high persistence ($\alpha + \beta \approx 0.97$) means volatility shocks decay slowly. </li>
<li>The relatively high $\alpha$ (0.38) indicates that recent shocks have substantial immediate impact on variance. </li>
<li>The optimization converged in 39 iterations with all parameters staying inside their bounds throughout. No invalid variance evaluations, no numerical exceptions.</li>
</ol>
<p>Visualizing the conditional variance alongside the original series provides further insight:</p>
<pre class="hljs-container hljs-container-solo"><code class="lang-gauss">// Compute conditional variance series for plotting
T = rows(returns);
sigma2_hat = zeros(T, 1);
sigma2_hat[1] = stdc(returns)^2;

for t (2, T, 1);
    sigma2_hat[t] = omega_hat + alpha_hat * returns[t-1]^2 + beta_hat * sigma2_hat[t-1];
endfor;

// Plot returns and conditional volatility
struct plotControl plt;
plt = plotGetDefaults("xy");
plotSetTitle(&amp;plt, "GARCH(1,1): Returns and Conditional Volatility");
plotSetYLabel(&amp;plt, "Returns / Volatility");

plotLayout(2, 1, 1);
plotXY(plt, seqa(1, 1, T), returns);

plotLayout(2, 1, 2);
plotSetTitle(&amp;plt, "Conditional Standard Deviation");
plotXY(plt, seqa(1, 1, T), sqrt(sigma2_hat));</code></pre>
<p><a href="https://www.aptech.com/wp-content/uploads/2026/04/garch-plot-var.png"><img src="https://www.aptech.com/wp-content/uploads/2026/04/garch-plot-var.png" alt="" width="640" height="480" class="aligncenter size-full wp-image-11585776" /></a></p>
<p>The plot shows volatility clustering: periods of high volatility tend to persist, consistent with what we observe in commodity markets.</p>
<h2 id="example-2-stochastic-frontier-model">Example 2: Stochastic Frontier Model</h2>
<p>Stochastic frontier analysis separates random noise from systematic inefficiency. It's widely used in productivity analysis to measure how far firms operate below their production frontier.</p>
<p>The model:</p>
<p>$y = X\beta + v - u$</p>
<p>where:</p>
<ul>
<li>$v \sim N(0, \sigma^2_v)$ — symmetric noise (measurement error, luck)</li>
<li>$u \sim N^+(0, \sigma^2_u)$ — one-sided inefficiency (always reduces output)</li>
</ul>
<p>Both variance components must be positive. If the optimizer tries $\sigma^2_v \lt 0$ or $\sigma^2_u \lt 0$, the likelihood involves square roots of negative numbers.</p>
<h3 id="step-one-data-and-likelihood-1">Step One: Data and Likelihood</h3>
<p>For this example, we'll simulate data from a Cobb-Douglas production function with inefficiency. This keeps the example self-contained and lets you see exactly what's being estimated.</p>
<pre class="hljs-container hljs-container-solo"><code class="lang-gauss">// Simulate production data
rndseed 8675309;
n = 500;

// Inputs (labor, capital, materials)
labor = exp(2 + 0.5*rndn(n, 1));
capital = exp(3 + 0.7*rndn(n, 1));
materials = exp(2.5 + 0.4*rndn(n, 1));

// True parameters
beta_true = { 1.5,    // constant
              0.4,    // labor elasticity
              0.3,    // capital elasticity
              0.25 }; // materials elasticity
sig2_v_true = 0.02;   // noise variance
sig2_u_true = 0.08;   // inefficiency variance

// Generate output with noise (v) and inefficiency (u)
v = sqrt(sig2_v_true) * rndn(n, 1);
u = sqrt(sig2_u_true) * abs(rndn(n, 1));  // half-normal

X = ones(n, 1) ~ ln(labor) ~ ln(capital) ~ ln(materials);
y = X * beta_true + v - u;  // inefficiency reduces output</code></pre>
<p>After simulating our data, we specify the log-likelihood function for minimization:</p>
<pre class="hljs-container hljs-container-solo"><code class="lang-gauss">// Stochastic frontier log-likelihood (half-normal inefficiency)
proc (1) = sf_negll(theta, y, X);
    local k, beta_, sig2_v, sig2_u, sigma, lambda;
    local eps, z, ll;

    k = cols(X);
    beta_ = theta[1:k];
    sig2_v = theta[k+1];
    sig2_u = theta[k+2];

    sigma = sqrt(sig2_v + sig2_u);
    lambda = sqrt(sig2_u / sig2_v);

    eps = y - X * beta_;
    z = -eps * lambda / sigma;

    ll = -0.5*ln(2*pi) + ln(2) - ln(sigma)
         - 0.5*(eps./sigma).^2 + ln(cdfn(z));

    retp(-sumc(ll));
endp;</code></pre>
<h3 id="step-two-setting-up-optimization-1">Step Two: Setting Up Optimization</h3>
<p>As we did in our previous example, we begin with our starting values. For this model, we run OLS and use the residual variance as starting values:</p>
<pre class="hljs-container hljs-container-solo"><code class="lang-gauss">// OLS for starting values
beta_ols = invpd(X'X) * X'y;
resid = y - X * beta_ols;
sig2_ols = meanc(resid.^2);

// Starting values: Split residual variance 
// between noise and inefficiency
theta0 = beta_ols | (0.5 * sig2_ols) | (0.5 * sig2_ols);</code></pre>
<p>We leave our coefficients unbounded but constrain the variances to be positive:</p>
<pre class="hljs-container hljs-container-solo"><code class="lang-gauss">// Bounds: coefficients unbounded, variances positive
k = cols(X);
struct minimizeControl ctl;
ctl = minimizeControlCreate();
ctl.bounds = (-1e300 * ones(k, 1) | 0.001 | 0.001) ~ (1e300 * ones(k+2, 1));</code></pre>
<h3 id="step-three-running-the-model-1">Step Three: Running the Model</h3>
<p>Finally, we call <code>minimize</code> to estimate our model: </p>
<pre class="hljs-container hljs-container-solo"><code class="lang-gauss">// Estimate
struct minimizeOut out;
out = minimize(&amp;sf_negll, theta0, y, X, ctl);</code></pre>
<h3 id="results-and-visualization-1">Results and Visualization</h3>
<p>Now that we've estimated our model, let's examine our results. </p>
<pre class="hljs-container hljs-container-solo"><code class="lang-gauss">// Extract estimates
k = cols(X);
beta_hat = out.x[1:k];
sig2_v_hat = out.x[k+1];
sig2_u_hat = out.x[k+2];

print "Coefficients:";
print "  constant     = " beta_hat[1];
print "  ln(labor)    = " beta_hat[2];
print "  ln(capital)  = " beta_hat[3];
print "  ln(materials)= " beta_hat[4];
print "";
print "Variance components:";
print "  sig2_v (noise)       = " sig2_v_hat;
print "  sig2_u (inefficiency)= " sig2_u_hat;
print "  ratio sig2_u/total   = " sig2_u_hat / (sig2_v_hat + sig2_u_hat);
print "";
print "Iterations: " out.iterations;</code></pre>
<p>This prints out coefficients and variance components:</p>
<pre>Coefficients:
  constant     = 1.51
  ln(labor)    = 0.39
  ln(capital)  = 0.31
  ln(materials)= 0.24

Variance components:
  sig2_v (noise)       = 0.022
  sig2_u (inefficiency)= 0.087
  ratio sig2_u/total   = 0.80

Iterations: 38</pre>
<p>The estimates recover the true parameters reasonably well. The variance ratio ($\approx 0.80$) tells us that most residual variation is systematic inefficiency, not measurement error — an important finding for policy.</p>
<p>We can also compute and plot firm-level efficiency scores:</p>
<pre class="hljs-container hljs-container-solo"><code class="lang-gauss">// Compute efficiency estimates (Jondrow et al. 1982)
eps = y - X * beta_hat;
sigma = sqrt(sig2_v_hat + sig2_u_hat);
lambda = sqrt(sig2_u_hat / sig2_v_hat);

mu_star = -eps * sig2_u_hat / (sig2_v_hat + sig2_u_hat);
sig_star = sqrt(sig2_v_hat * sig2_u_hat / (sig2_v_hat + sig2_u_hat));

// E[u|eps] - conditional mean of inefficiency
u_hat = mu_star + sig_star * (pdfn(mu_star/sig_star) ./ cdfn(mu_star/sig_star));

// Technical efficiency: TE = exp(-u)
TE = exp(-u_hat);

// Plot efficiency distribution
struct plotControl plt;
plt = plotGetDefaults("hist");
plotSetTitle(&amp;plt, "Distribution of Technical Efficiency");
plotSetXLabel(&amp;plt, "Technical Efficiency (1 = frontier)");
plotSetYLabel(&amp;plt, "Frequency");
plotHist(plt, TE, 20);

print "Mean efficiency: " meanc(TE);
print "Min efficiency:  " minc(TE);
print "Max efficiency:  " maxc(TE);</code></pre>
<pre>Mean efficiency: 0.80
Min efficiency:  0.41
Max efficiency:  0.95</pre>
<p><a href="https://www.aptech.com/wp-content/uploads/2026/04/stochastic-frontier-histogram.png"><img src="https://www.aptech.com/wp-content/uploads/2026/04/stochastic-frontier-histogram.png" alt="" width="640" height="480" class="aligncenter size-full wp-image-11585777" /></a></p>
<p>The histogram shows substantial variation in efficiency — some firms operate near the frontier (TE $\approx$ 0.95), while others produce 40-50% below their potential. This is the kind of insight that drives productivity research.</p>
<p>Both variance estimates stayed positive throughout optimization. No log-transforms needed, and the estimates apply directly to the parameters we care about.</p>
<h2 id="when-to-use-minimize">When to Use minimize</h2>
<p>The <code>minimize</code> procedure is designed for one thing: optimization with bound constraints. If that's all you need, it's the right tool.</p>
<table>
<thead>
<tr>
<th>Situation</th>
<th>Recommendation</th>
</tr>
</thead>
<tbody>
<tr>
<td>Parameters with simple bounds</td>
<td><a href="https://docs.aptech.com/gauss/minimize.html" target="_blank" rel="noopener"><code>minimize</code></a></td>
</tr>
<tr>
<td>Nonlinear constraints ($g(x) \leq 0$)</td>
<td><a href="https://docs.aptech.com/gauss/sqpsolvemt.html" target="_blank" rel="noopener"><code>sqpSolveMT</code></a></td>
</tr>
<tr>
<td>Equality constraints</td>
<td><code>sqpSolveMT</code></td>
</tr>
<tr>
<td>Algorithm switching, complex problems</td>
<td><a href="https://docs.aptech.com/gauss/optmt/index.html" target="_blank" rel="noopener">OPTMT</a></td>
</tr>
</tbody>
</table>
<p>For the GARCH and stochastic frontier examples above — and most MLE problems where parameters have natural bounds — <code>minimize</code> handles it directly.</p>
<h2 id="conclusion">Conclusion</h2>
<p>Bounded parameters show up constantly in econometric models: variances, volatilities, probabilities, shares. GAUSS 26.0.1 gives you a clean way to handle them with <code>minimize</code>. As we saw today <code>minimize</code>:</p>
<ul>
<li>Set bounds in the control structure</li>
<li>Optimizer respects bounds throughout (not just at the solution)</li>
<li>No log-transforms or penalty functions</li>
<li>Included in base GAUSS</li>
</ul>
<p>If you've been working around parameter bounds with transforms or checking for invalid values inside your likelihood function, this is the cleaner path.</p>
<h2 id="further-reading">Further Reading</h2>
<ul>
<li><a href="https://www.aptech.com/blog/garch-estimation/">GARCH estimation in GAUSS</a></li>
<li><a href="https://www.aptech.com/blog/stochastic-frontier-analysis/">Introduction to stochastic frontier models</a></li>
</ul>
<p>    <!-- MathJax configuration -->
    <style>
        .mjx-svg-href {
            fill: "inherit" !important;
            stroke: "inherit" !important;
        }
    </style>
    <script type="text/x-mathjax-config">
        MathJax.Hub.Config({ TeX: { equationNumbers: {autoNumber: "AMS"} } });
    </script>
    <script type="text/javascript">
window.MathJax = {
  tex2jax: {
    inlineMath: [ ['$','$'] ],
    displayMath: [ ['$$','$$'] ],
    processEscapes: true,
    processEnvironments: true
  },
  // Center justify equations in code and markdown cells. Elsewhere
  // we use CSS to left justify single line equations in code cells.
  displayAlign: 'center',
  "HTML-CSS": {
    styles: {'.MathJax_Display': {"margin": 0}},
    linebreaks: { automatic: false }
  },
  "SVG": {
    styles: {'.MathJax_SVG_Display': {"margin": 0}},
    linebreaks: { automatic: false }
  },
  showProcessingMessages: false,
  messageStyle: "none",
  menuSettings: { zoom: "Click" },
  AuthorInit: function() {
    MathJax.Hub.Register.StartupHook("End", function() {
            var timeout = false, // holder for timeout id
            delay = 250; // delay after event is "complete" to run callback
            var shrinkMath = function() {
              //var dispFormulas = document.getElementsByClassName("formula");
              var dispFormulas = document.getElementsByClassName("MathJax_SVG_Display");
              if (dispFormulas){
                // caculate relative size of indentation
                var contentTest = document.getElementsByTagName("body")[0];
                var nodesWidth = contentTest.offsetWidth;
                // if you have indentation
                var mathIndent = MathJax.Hub.config.displayIndent; //assuming px's
                var mathIndentValue = mathIndent.substring(0,mathIndent.length - 2);
                for (var i=0; i<dispFormulas.length; i++){
                  var dispFormula = dispFormulas[i];
                  var wrapper = dispFormula;
                  //var wrapper = dispFormula.getElementsByClassName("MathJax_Preview")[0].nextSibling;
                  var child = wrapper.firstChild;
                  wrapper.style.transformOrigin = "center"; //or top-left if you left-align your equations
                  var oldScale = child.style.transform;
                  //var newValue = Math.min(0.80*dispFormula.offsetWidth / child.offsetWidth,1.0).toFixed(2);
                  var newValue = Math.min(dispFormula.offsetWidth / child.offsetWidth,1.0).toFixed(2);
                  var newScale = "scale(" + newValue + ")";
                  if(newValue != "NaN" && !(newScale === oldScale)){
                    wrapper.style.transform = newScale;
                    wrapper.style["margin-left"]= Math.pow(newValue,4)*mathIndentValue + "px";
                    var wrapperStyle = window.getComputedStyle(wrapper);
                    var wrapperHeight = parseFloat(wrapperStyle.height);
                    wrapper.style.height = "" + (wrapperHeight * newValue) + "px";
                    if(newValue === "1.00"){
                      wrapper.style.cursor = "";
                      wrapper.style.height = "";
                    }
                    else {
                      wrapper.style.cursor = "zoom-in";
                    }
                  }

                }
            }
            };
            shrinkMath();
            window.addEventListener('resize', function() {
              clearTimeout(timeout);
              timeout = setTimeout(shrinkMath, delay);
            });
          });
  }
}
</script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.7/MathJax.js?config=TeX-AMS_SVG"></script></p>

]]></content:encoded>
					
					<wfw:commentRss>https://www.aptech.com/blog/mle-with-bounded-parameters-a-cleaner-approach/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
		<item>
		<title>Why You Should Consider Constrained Maximum Likelihood MT (CMLMT)</title>
		<link>https://www.aptech.com/blog/why-you-should-consider-constrained-maximum-likelihood-mt-cmlmt/</link>
					<comments>https://www.aptech.com/blog/why-you-should-consider-constrained-maximum-likelihood-mt-cmlmt/#respond</comments>
		
		<dc:creator><![CDATA[Eric]]></dc:creator>
		<pubDate>Wed, 09 Apr 2025 13:49:48 +0000</pubDate>
				<category><![CDATA[Econometrics]]></category>
		<category><![CDATA[Programming]]></category>
		<guid isPermaLink="false">https://www.aptech.com/?p=11585183</guid>

					<description><![CDATA[The <b>Constrained Maximum Likelihood (CML)</b> library was one of the original constrained optimization tools in GAUSS. Like many GAUSS libraries, it was later updated to an "MT" version.

The "MT" version libraries, named for their use of multi-threading, provide significant performance improvements, greater flexibility, and a more intuitive parameter-handling system.

This blog post explores:
<ul><li> The key features, differences, and benefits of upgrading from <b>CML</b> to <b>CMLMT</b>.</li>
<li>A practical example to help you transition code from <b>CML</b> to <b>CMLMT</b>.</li>
</ul>]]></description>
										<content:encoded><![CDATA[<h2 id="introduction">Introduction</h2>
<p>The <a href="https://docs.aptech.com/gauss/cmlmt/" target="_blank" rel="noopener"><strong>Constrained Maximum Likelihood (CML)</strong></a> library was one of the original constrained optimization tools in GAUSS. Like many GAUSS libraries, it was later updated to an &quot;MT&quot; version.</p>
<p>The &quot;MT&quot; version libraries, named for their use of multi-threading, provide significant performance improvements, greater flexibility, and a more intuitive parameter-handling system.</p>
<p>This blog post explores:</p>
<ul>
<li>The key features, differences, and benefits of upgrading from <strong>CML</strong> to <strong>CMLMT</strong>.</li>
<li>A practical example to help you transition code from <strong>CML</strong> to <strong>CMLMT</strong>.</li>
</ul>
<h2 id="key-features-comparison">Key Features Comparison</h2>
<p>Before diving into the details of transitioning from <strong>CML</strong> to <strong>CMLMT</strong>, it’s useful to understand how these two libraries compare. The table below highlights key differences, from optimization algorithms to constraint handling.</p>
<table>
<thead>
<tr>
<th>Feature</th>
<th>CML (2.0)</th>
<th>CMLMT (3.0)</th>
</tr>
</thead>
<tbody>
<tr>
<td><strong>Optimization Algorithm</strong></td>
<td>Sequential Quadratic Programming (SQP) with BFGS, DFP, and Newton-Raphson methods.</td>
<td>SQP with improved secant algorithms and Cholesky updates for Hessian approximation.</td>
</tr>
<tr>
<td><a href="https://www.aptech.com/resources/tutorials/gauss-threading-tutorial-part-1/" target="_blank" rel="noopener"><strong>Parallel Computing Support</strong></a></td>
<td>No multi-threading support.</td>
<td>Multi-threading enabled for numerical derivatives and bootstrapping.</td>
</tr>
<tr>
<td><strong>Log-Likelihood Computation</strong></td>
<td>Function and derivatives computed separately, requiring redundant calculations.</td>
<td>Unified procedure for computing log-likelihood, first derivatives, and second derivatives, reducing redundant computations.</td>
</tr>
<tr>
<td><strong>Parameter Handling</strong></td>
<td>Supports only a simple parameter vector.</td>
<td>Supports both a simple parameter vector and a <code>PV</code> structure (for advanced parameter management). Additionally, allows an unlimited number of data arguments in the log-likelihood function, simplifying the function and improving computation time.</td>
</tr>
<tr>
<td><strong>Constraints Handling</strong></td>
<td>Supports linear and nonlinear equality/inequality constraints.</td>
<td>Improved constraint handling with an explicit control structure for optimization.</td>
</tr>
<tr>
<td><strong>Line Search Methods</strong></td>
<td>STEPBT (quadratic/cubic fitting), BRENT, HALF, and BHHHSTEP.</td>
<td>Introduces the <strong>Augmented Lagrangian Penalty</strong> method for constrained models. Also includes STEPBT (quadratic/cubic fitting), BRENT, HALF, and BHHHSTEP.</td>
</tr>
<tr>
<td><strong>Statistical Inference</strong></td>
<td>Basic hypothesis testing.</td>
<td>Enhanced hypothesis testing for constrained models, including profile likelihoods, bootstrapping, and Lagrange multipliers.</td>
</tr>
<tr>
<td><strong>Handling of Fixed Parameters</strong></td>
<td>Global variables used to fix parameters.</td>
<td>Uses the <code>cmlmtControl</code> structure for setting fixed parameters.</td>
</tr>
<tr>
<td><strong>Run-Time Adjustments</strong></td>
<td>Uses global variables to modify settings.</td>
<td>The <code>cmlmtControl</code> structure enables flexible tuning of optimization settings.</td>
</tr>
</tbody>
</table>
<h2 id="advantages-of-cmlmt">Advantages of <strong>CMLMT</strong></h2>
<p>Beyond just performance improvements, CMLMT introduces several key advantages that make it a more powerful and user-friendly tool for constrained maximum likelihood estimation. These improvements do more than just support multi-threading, they provide greater flexibility, efficiency, and accuracy in model estimation. </p>
<p>Some of the most notable advantages include:</p>
<ol>
<li><strong>Threading &amp; Multi-Core Support</strong>: <strong>CMLMT</strong> enables multi-threading, significantly speeding up numerical derivatives and bootstrapping, whereas <strong>CML</strong> is single-threaded.  </li>
<li><strong>Simplified Parameter Handling</strong>: Only <strong>CMLMT</strong> supports both a simple parameter vector and the <code>PV</code> structure for advanced models. Additionally, <strong>CMLMT</strong> allows <a href="https://www.aptech.com/blog/the-basics-of-optional-arguments-in-gauss-procedures/" target="_blank" rel="noopener">dynamic arguments</a>, making it easier to pass data to the log-likelihood function.  </li>
<li><strong>More Efficient Log-Likelihood Computation</strong>: <strong>CMLMT</strong> integrates the analytic computation of log-likelihood, first derivatives, and second derivatives into a user-specified log-likelihood procedure, reducing redundancy.  </li>
<li><strong>Augmented Lagrangian Method</strong>: <strong>CMLMT</strong> introduces an <strong>Augmented Lagrangian Penalty Line Search</strong> for handling constrained optimization.  </li>
<li><strong>Enhanced Statistical Inference</strong>: <strong>CMLMT</strong> includes <a href="https://docs.aptech.com/gauss/cmlmt/cmlmtboot.html" target="_blank" rel="noopener">bootstrapping</a>, <a href="https://docs.aptech.com/gauss/cmlmt/cmlmtprofile.html" target="_blank" rel="noopener">profile likelihoods</a>, and hypothesis testing improvements, which are limited in <strong>CML</strong>.  </li>
</ol>
<h2 id="converting-a-cml-model-to-cmlmt">Converting a <strong>CML</strong> Model to <strong>CMLMT</strong></h2>
<p>Let's use a simple example to walk through the step-by-step transition from <strong>CML</strong> to <strong>CMLMT</strong>. In this model, we will perform constrained maximum likelihood estimation for a Poisson model.  </p>
<p>The dataset is included in the <strong>CMLMT</strong> library.  </p>
<h3 id="original-cml-code">Original <strong>CML</strong> Code</h3>
<p>We will start by estimating the model using <strong>CML</strong>:</p>
<pre class="hljs-container hljs-container-solo"><code class="lang-gauss">new;
library cml;
#include cml.ext;
cmlset;

// Load data
data = loadd(getGAUSSHome("pkgs/cmlmt/examples/cmlmtpsn.dat"));

// Set constraints for first two coefficients
// to be equal
_cml_A = { 1 -1 0 };   
_cml_B = { 0 };  

// Specify starting parameters
beta0 = .5|.5|.5;

// Run optimization
{ _beta, f0, g, cov, retcode } = CMLprt(cml(data, 0, &amp;logl, beta0));

// Specify log-likelihood function
proc logl(b, data);
   local m, x, y;

   // Extract x and y
   y = data[., 1];
   x = data[., 2:4];

   m = x * b;

  retp(y .* m - exp(m));
endp;</code></pre>
<p>This code prints the following output:</p>
<pre>Mean log-likelihood       -0.670058
Number of cases     100

Covariance of the parameters computed by the following method:
Inverse of computed Hessian

Parameters    Estimates     Std. err.    Gradient
------------------------------------------------------------------
P01              0.1199        0.1010      0.0670
P02              0.1199        0.1010     -0.0670
P03              0.8343        0.2648      0.0000

Number of iterations    5
Minutes to convergence     0.00007</pre>
<h3 id="step-one-switch-to-cmlmt-library">Step One: Switch to <strong>CMLMT</strong> Library</h3>
<p>The first step in updating our program file is to load the <strong>CMLMT</strong> library instead of the <strong>CML</strong> library.  </p>
<table>
<thead>
<tr>
<th>Original CML Code</th>
</tr>
</thead>
<tbody>
</tbody>
</table>
<pre class="hljs-container hljs-container-solo"><code class="lang-gauss">// Clear workspace and load library
new;
library cml;</code></pre>
<table>
<thead>
<tr>
<th>New CMLMT Code</th>
</tr>
</thead>
<tbody>
</tbody>
</table>
<pre class="hljs-container hljs-container-solo"><code class="lang-gauss">// Clear workspace and load library
new;
library cmlmt;</code></pre>
<h3 id="step-two-load-data">Step Two: Load Data</h3>
<p>Since data loading is handled by GAUSS base procedures, no changes are necessary.  </p>
<table>
<thead>
<tr>
<th>Original CML and CMLMT Code</th>
</tr>
</thead>
<tbody>
</tbody>
</table>
<pre class="hljs-container hljs-container-solo"><code class="lang-gauss">// Load data
x = loadd(getGAUSSHome("pkgs/cmlmt/examples/cmlmtpsn.dat"));

// Extract x and y
y = x[., 1];
x = x[., 2:4];</code></pre>
<h3 id="step-three-setting-constraints">Step Three: Setting Constraints</h3>
<p>The next step is to convert the global variables used to control optimization in <strong>CML</strong> into members of the <code>cmlmtControl</code> structure. To do this, we need to:  </p>
<ol>
<li>Declare an instance of the <code>cmlmtControl</code> structure.  </li>
<li>Initialize the <code>cmlmtControl</code> structure with default values using <a href="https://docs.aptech.com/gauss/cmlmt/cmlmtcontrolcreate.html" target="_blank" rel="noopener"><code>cmlmtControlCreate</code></a>.  </li>
<li>Assign the constraint vectors to the corresponding <code>cmlmtControl</code> structure members.  </li>
</ol>
<table>
<thead>
<tr>
<th>Original CML Code</th>
</tr>
</thead>
<tbody>
</tbody>
</table>
<pre class="hljs-container hljs-container-solo"><code class="lang-gauss">// Set constraints for first two coefficients
// to be equal
_cml_A = { 1 -1 0 };   
_cml_B = { 0 };  </code></pre>
<table>
<thead>
<tr>
<th>New CMLMT Code</th>
</tr>
</thead>
<tbody>
</tbody>
</table>
<pre class="hljs-container hljs-container-solo"><code class="lang-gauss">//Declare and initialize control structure
struct cmlmtControl ctl;
ctl = cmlmtControlCreate();

// Set constraints for first two coefficients
// to be equal
ctl.A = { 1 -1 0 };   
ctl.B = { 0 };       </code></pre>
<h3 id="step-four-specify-starting-values">Step Four: Specify Starting Values</h3>
<p>In our original <strong>CML</strong> code, we specified the starting parameters using a vector of values. In the <strong>CMLMT</strong> library, we can specify the starting values using either a parameter vector or a <code>PV</code> structure.  </p>
<p>The advantage of the <code>PV</code> structure is that it allows parameters to be stored in different formats, such as symmetric matrices or matrices with fixed parameters. This, in turn, can simplify calculations inside the log-likelihood function.  </p>
<p>If we use the parameter vector option, we don't need to make any changes to our original code:  </p>
<table>
<thead>
<tr>
<th>Original CML and CMLMT Code</th>
</tr>
</thead>
<tbody>
</tbody>
</table>
<pre class="hljs-container hljs-container-solo"><code class="lang-gauss">// Specify starting parameters
beta0 = .5|.5|.5;</code></pre>
<p>Using the <code>PV</code> structure option requires additional steps:  </p>
<ol>
<li>Declare an instance of the <code>PV</code> structure.  </li>
<li>Initialize the <code>PV</code> structure using the <code>PVCreate</code> procedure.  </li>
<li>Use the <code>PVpack</code> functions to create and define specific parameter types within the <code>PV</code> structure.  </li>
</ol>
<table>
<thead>
<tr>
<th>New CMLMT Code to use PV</th>
</tr>
</thead>
<tbody>
</tbody>
</table>
<pre class="hljs-container hljs-container-solo"><code class="lang-gauss">// Declare instance of 'PV' struct
struct PV p0;

// Initialize p0
p0 = pvCreate();

// Create parameter vector
beta0 = .5|.5|.5;

// Load parameters into p0
p0 = pvPack(p0, beta0, "beta");</code></pre>
<h3 id="step-five-the-likelihood-function">Step Five: The Likelihood Function</h3>
<p>In <strong>CML</strong>, the likelihood function takes only two parameters:  </p>
<ol>
<li>A parameter vector.  </li>
<li>A data matrix.  </li>
</ol>
<table>
<thead>
<tr>
<th>Original CML Code</th>
</tr>
</thead>
<tbody>
</tbody>
</table>
<pre class="hljs-container hljs-container-solo"><code class="lang-gauss">// Specify log-likelihood function
proc logl(b, data);
   local m, x, y;

   // Extract x and y
   y = data[., 1];
   x = data[., 2:4];

   m = x * b;

  retp(y .* m - exp(m));
endp;</code></pre>
<p>The likelihood function in <strong>CMLMT</strong> is enhanced in several ways:  </p>
<ol>
<li>We can pass as many arguments as needed to the likelihood function. This allows us to simplify the function, which, in turn, can speed up optimization.  </li>
<li>We return output from the likelihood function in the form of the <code>modelResults</code> structure. This makes computations thread-safe and allows us to specify both gradients and Hessians inside the likelihood function:  
<ul>
<li>The likelihood function values are stored in the <code>mm.function</code> member.  </li>
<li>The gradients are stored in the <code>mm.gradient</code> member.  </li>
<li>The Hessians are stored in the <code>mm.hessian</code> member.  </li>
</ul></li>
<li>The last input into the likelihood function must be <code>ind</code>.<code>ind</code> is passed to your log-likelihood function when it is called by <strong>CMLMT</strong>. It tells your function whether <strong>CMLMT</strong> needs you to compute the gradient and Hessian, or just the  function value. <a href="https://docs.aptech.com/gauss/cmlmt/cmlmt-examples.html" target="_blank" rel="noopener">(see online examples)</a>. NOTE: You are never required to compute the gradient or Hessian if requested by <code>ind</code>. If you do not compute it, <strong>CMLMT</strong> will compute numerical derivatives.</li>
</ol>
<table>
<thead>
<tr>
<th>New CMLMT Code</th>
</tr>
</thead>
<tbody>
</tbody>
</table>
<pre class="hljs-container hljs-container-solo"><code class="lang-gauss">// Specify log-likelihood function
// Allows separate arguments for y &amp; x
// Also has 'ind' as last argument
proc logl(b, y, x, ind);
   local m;

   // Declare modeResult structure
   struct modelResults mm;

   // Likelihood computation
   m = x * b;

   // If the first element of 'ind' is not zero,
   // CMLMT wants us to compute the function value
   // which we assign to mm.function
   if ind[1];
      mm.function = y .* m - exp(m);
   endif;

   retp(mm);
endp;</code></pre>
<h3 id="step-six-run-optimization">Step Six: Run Optimization</h3>
<p>We estimate the maximum likelihood parameters in <strong>CML</strong> using the <code>cml</code> procedure. The <code>cml</code> procedure returns five parameters, and a results table is printed using the <code>cmlPrt</code> procedure.  </p>
<table>
<thead>
<tr>
<th>Original CML Code</th>
</tr>
</thead>
<tbody>
</tbody>
</table>
<pre class="hljs-container hljs-container-solo"><code class="lang-gauss">/*
** Run optimization
*/
// Run optimization
{ _beta, f0, g, cov, retcode } = cml(data, 0, &amp;logl, beta0);

// Print results
CMLprt(_beta, f0, g, cov, retcode);</code></pre>
<p>In <strong>CMLMT</strong>, estimation is performed using the <a href="https://docs.aptech.com/gauss/cmlmt/cmlmt.html" target="_blank" rel="noopener"><code>cmlmt</code></a> procedure. The <code>cmlmt</code> procedure returns a <code>cmlmtResults</code> structure, and a results table is printed using the <code>cmlmtPrt</code> procedure.  </p>
<p>To convert to <code>cmlmt</code>, we take the following steps:  </p>
<ol>
<li>Declare an instance of the <code>cmlmtResults</code> structure.  </li>
<li>Call the <code>cmlmt</code> procedure. Following an initial pointer to the log-likelihood function, the parameter and data inputs are passed to <code>cmlmt</code> in the exact order they are specified in the log-likelihood function.  </li>
<li>The output from <code>cmlmt</code> is stored in the <code>cmlmtResults</code> structure, <code>out</code>.  </li>
</ol>
<table>
<thead>
<tr>
<th>New CMLMT Code</th>
</tr>
</thead>
<tbody>
</tbody>
</table>
<pre class="hljs-container hljs-container-solo"><code class="lang-gauss">/*
** Run optimization
*/
// Declare output structure
struct cmlmtResults out;

// Run estimation
out = cmlmt(&amp;logl, beta0, y, x, ctl);

// Print output
cmlmtPrt(out);</code></pre>
<h2 id="conclusion">Conclusion</h2>
<p>Upgrading from <strong>CML</strong> to <strong>CMLMT</strong> provides faster performance, improved numerical stability, and easier parameter management. The addition of multi-threading, better constraint handling, and enhanced statistical inference makes <strong>CMLMT</strong> a powerful upgrade for GAUSS users.  </p>
<p>If you're still using <strong>CML</strong>, consider transitioning to <strong>CMLMT</strong> for a more efficient and flexible modeling experience!  </p>
<h3 id="further-reading">Further Reading</h3>
<ol>
<li><a href="https://www.aptech.com/blog/anchoring-vignettes-and-the-compound-hierarchical-ordered-probit-chopit-model/" target="_blank" rel="noopener">Beginner's Guide To Maximum Likelihood Estimation</a></li>
<li><a href="https://www.aptech.com/blog/maximum-likelihood-estimation-in-gauss/" target="_blank" rel="noopener">Maximum Likelihood Estimation in GAUSS</a></li>
<li><a href="https://www.aptech.com/examples/cmlmt/ordered-probit-estimation-with-constrained-maximum-likelihood/" target="_blank" rel="noopener">Ordered Probit Estimation with Constrained Maximum Likelihood</a></li>
</ol>
<h2 id="try-out-the-gauss-constrained-maximum-likelihood-mt-library">Try out The GAUSS Constrained Maximum Likelihood MT Library</h2>

[contact-form-7]

]]></content:encoded>
					
					<wfw:commentRss>https://www.aptech.com/blog/why-you-should-consider-constrained-maximum-likelihood-mt-cmlmt/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
		<item>
		<title>Introducing the GAUSS Data Management Guide</title>
		<link>https://www.aptech.com/blog/introducing-the-gauss-data-management-guide/</link>
					<comments>https://www.aptech.com/blog/introducing-the-gauss-data-management-guide/#respond</comments>
		
		<dc:creator><![CDATA[Eric]]></dc:creator>
		<pubDate>Tue, 20 Feb 2024 18:50:08 +0000</pubDate>
				<category><![CDATA[Best Practices]]></category>
		<category><![CDATA[Programming]]></category>
		<guid isPermaLink="false">https://www.aptech.com/?p=11584443</guid>

					<description><![CDATA[If you've worked with real-world data, you know that data cleaning and management can eat up your time. Efficiently tackling tedious data cleaning, organization, and management tasks can have a huge impact on productivity. 
<br>
We created the <b>GAUSS Data Management Guide</b> with that exact goal in mind. It's aimed to help you save time and make the most of your data.
<br>
Today's blog looks at what the GAUSS Data Management Guide offers and how to best use the guide.]]></description>
										<content:encoded><![CDATA[<h3 id="introduction">Introduction</h3>
<p>If you've worked with real-world data, you know that data cleaning and management can eat up your time. Efficiently tackling tedious data cleaning, organization, and management tasks can have a huge impact on productivity. </p>
<p>We created the <strong>GAUSS Data Management Guide</strong> with that exact goal in mind. It's aimed to help you save time and make the most of your data.</p>
<p>Today's blog looks at what the <strong>GAUSS Data Management Guide</strong> offers and how to best use the guide.</p>
<h2 id="what-is-the-gauss-data-management-guide">What is the GAUSS Data Management Guide?</h2>
<p>The <strong><a href="https://docs.aptech.com/gauss/data-management.html" target="_blank" rel="noopener">GAUSS Data Management Guide</a></strong> is a comprehensive reference tool for accomplishing data-related tasks in GAUSS. It provides a detailed roadmap for working with data in GAUSS, from basic data import and manipulation to advanced data cleaning and visualization. </p>
<p>The guide is intentionally designed for all levels of GAUSS users with:</p>
<ul>
<li>Extensive coverage.</li>
<li>Step-by-step instructions.</li>
<li>Annotated examples. </li>
</ul>
<h2 id="what-does-the-gauss-data-management-guide-cover">What does the GAUSS Data Management Guide cover?</h2>
<p>The <strong>GAUSS Data Management Guide</strong> includes sections for:</p>
<ul>
<li><a href="https://docs.aptech.com/gauss/data-management/interactive-import.html" target="_blank" rel="noopener">Interactive Data Import</a></li>
<li><a href="https://docs.aptech.com/gauss/data-management/programmatic-import.html" target="_blank" rel="noopener">Programmatic Data Import</a></li>
<li><a href="https://docs.aptech.com/gauss/data-management/programmatic-export.html" target="_blank" rel="noopener">Programmatic Export</a></li>
<li><a href="https://docs.aptech.com/gauss/data-management/data-cleaning.html" target="_blank" rel="noopener">Data Cleaning</a></li>
<li><a href="https://docs.aptech.com/gauss/data-management/data-exploration.html" target="_blank" rel="noopener">Data Exploration</a></li>
<li><a href="https://docs.aptech.com/gauss/data-management/data-sampling.html" target="_blank" rel="noopener">Data Sampling</a></li>
<li><a href="https://docs.aptech.com/gauss/data-management/data-smoothing.html" target="_blank" rel="noopener">Data Smoothing</a></li>
<li><a href="https://docs.aptech.com/gauss/data-management/data-transformations.html" target="_blank" rel="noopener">Data Transformations</a></li>
</ul>
<h2 id="how-should-i-use-the-gauss-data-management-guide">How should I use the <strong>GAUSS Data Management Guide</strong>?</h2>
<ul>
<li>
<p>Use page outlines, located on the right-hand side of each page, to identify and navigate to specific tasks.
<a href="https://www.aptech.com/wp-content/uploads/2024/02/page-outline-2.jpg"><img src="https://www.aptech.com/wp-content/uploads/2024/02/page-outline-2.jpg" alt="" width="145" height="292" class="aligncenter size-full wp-image-11584458" /></a></p>
</li>
<li>
<p>Copy the examples in the guide and paste into GAUSS program files to use as templates.
<a href="https://www.aptech.com/wp-content/uploads/2024/02/example-template2.jpg"><img src="https://www.aptech.com/wp-content/uploads/2024/02/example-template2.jpg" alt="" width="525" height="254" class="aligncenter size-full wp-image-11584459" /></a></p>
</li>
<li>Use the links to complete function reference pages to find additional support.
<a href="https://www.aptech.com/wp-content/uploads/2024/02/command-reference2.jpg"><img src="https://www.aptech.com/wp-content/uploads/2024/02/command-reference2.jpg" alt="" width="422" height="316" class="aligncenter size-full wp-image-11584460" /></a></li>
</ul>
<h3 id="conclusion">Conclusion</h3>
<p>The <strong>GAUSS Data Management Guide</strong> provides practical examples, detailed instructions, and comprehensive coverage that can help work productively and efficiently with your data.</p>
<h3 id="further-reading">Further Reading</h3>
<ol>
<li><a href="https://www.aptech.com/blog/gauss-basics-interactive-commands/" target="_blank" rel="noopener">GAUSS Basics: Interactive commands</a></li>
<li><a href="https://www.aptech.com/blog/gauss-basics-2-running-a-program/" target="_blank" rel="noopener">GAUSS Basics 2: Running a program</a></li>
<li><a href="https://www.aptech.com/blog/gauss-basics-3-introduction-to-matrices/" target="_blank" rel="noopener">GAUSS Basics 3: Introduction to matrices</a></li>
<li><a href="https://www.aptech.com/blog/gauss-basics-4-matrix-operations/" target="_blank" rel="noopener">GAUSS Basics 4: Matrix operations</a></li>
<li><a href="https://www.aptech.com/blog/gauss-basics-5-element-by-element-conformability/" target="_blank" rel="noopener">GAUSS Basics 5: Element-by-element conformability</a></li>
<li><a href="https://www.aptech.com/blog/gauss-basics-6-logical-and-relational-operators/" target="_blank" rel="noopener">GAUSS Basics 6: Logical and relational operators</a></li>
<li><a href="https://www.aptech.com/blog/gauss-basics-7-conditional-statements/" target="_blank" rel="noopener">GAUSS Basics 7: Conditional statements</a></li>
<li><a href="https://www.aptech.com/blog/basics-of-gauss-procedures/" target="_blank" rel="noopener">Basics of GAUSS Procedures</a></li>
<li><a href="https://www.aptech.com/blog/understanding-errors-g0025-undefined-symbol/" target="_blank" rel="noopener">Understanding Errors | G0025 : Undefined symbol</a></li>
<li><a href="https://www.aptech.com/blog/understanding-errors-g0064-operand-missing/" target="_blank" rel="noopener">Understanding Errors: G0064 Operand Missing</a></li>
<li><a href="https://www.aptech.com/blog/understanding-errors-g0058-index-out-of-range/" target="_blank" rel="noopener">Understanding Errors: G0058 Index out-of-Range</a></li>
</ol>


]]></content:encoded>
					
					<wfw:commentRss>https://www.aptech.com/blog/introducing-the-gauss-data-management-guide/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
		<item>
		<title>Transforming Panel Data to Long Form in GAUSS</title>
		<link>https://www.aptech.com/blog/transforming-panel-data-to-long-form-in-gauss/</link>
					<comments>https://www.aptech.com/blog/transforming-panel-data-to-long-form-in-gauss/#respond</comments>
		
		<dc:creator><![CDATA[Eric]]></dc:creator>
		<pubDate>Tue, 12 Dec 2023 21:24:59 +0000</pubDate>
				<category><![CDATA[Econometrics]]></category>
		<category><![CDATA[Panel data]]></category>
		<category><![CDATA[Programming]]></category>
		<guid isPermaLink="false">https://www.aptech.com/?p=11584134</guid>

					<description><![CDATA[Anyone who works with panel data knows that pivoting between long and wide form, though commonly necessary, can still be painstakingly tedious, at best. It can lead to frustrating errors, unexpected results, and lengthy troubleshooting, at worst.

<br>The new dfLonger and dfWider procedures introduced in GAUSS 24 make great strides towards fixing that. Extensive planning has gone into each procedure, resulting in comprehensive but intuitive functions.

<br>In today's blog, we will walk through all you need to know about the dfLonger procedure to tackle even the most complex cases of transforming wide form panel data to long form.]]></description>
										<content:encoded><![CDATA[<h3 id="introduction">Introduction</h3>
<p>Anyone who works with <a href="https://www.aptech.com/blog/introduction-to-the-fundamentals-of-panel-data/" target="_blank" rel="noopener">panel data</a> knows that pivoting between long and wide form, though commonly necessary, can still be painstakingly tedious, at best. It can lead to frustrating errors, unexpected results, and lengthy troubleshooting, at worst.</p>
<p>The new <a href="https://docs.aptech.com/gauss/dflonger.html" target="_blank" rel="noopener">dfLonger</a> and <a href="https://docs.aptech.com/gauss/dfwider.html" target="_blank" rel="noopener">dfWider</a> procedures introduced in <a href="https://www.aptech.com/blog/introducing-gauss-24/" target="_blank" rel="noopener">GAUSS 24</a> make great strides towards fixing that. Extensive planning has gone into each procedure, resulting in comprehensive but intuitive functions.</p>
<p>In today's blog, we will walk through all you need to know about the <code>dfLonger</code> <a href="https://www.aptech.com/blog/basics-of-gauss-procedures/" target="_blank" rel="noopener">procedure</a> to tackle even the most complex cases of transforming wide form panel data to long form. </p>
<h2 id="the-rules-of-tidy-data">The Rules of Tidy Data</h2>
<p>Before we get started, it will be useful to consider what makes data tidy (and why tidy data is important). </p>
<p>It's useful to think of breaking our data into components (these subsets will come in handy later when working with <code>dflonger</code>): </p>
<ul>
<li>Values.</li>
<li>Observations.</li>
<li>Variables.</li>
</ul>
<p><a href="https://www.aptech.com/wp-content/uploads/2023/12/Blank-diagram-2.jpg"><img src="https://www.aptech.com/wp-content/uploads/2023/12/Blank-diagram-2.jpg" alt="Components of data." width="757" height="887" class="aligncenter size-full wp-image-11584144" /></a></p>
<p>We can use these components to define some basic rules for tidy data:</p>
<ol>
<li>Variables have unique columns.</li>
<li>Observations have unique rows.</li>
<li>Values have unique cells.</li>
</ol>
<h3 id="example-one-wide-form-state-population-table">Example One: Wide Form State Population Table</h3>
<table>
 <thead>
<tr><th>State</th><th>2020</th><th>2021</th><th>2022</th></tr>
</thead>
<tbody>
<tr><td>Alabama</td><td>5,031,362</td><td>5,049,846</td><td>5,074,296</td></tr>
<tr><td>Alaska</td><td>732,923</td><td>734,182</td><td>733,583</td></tr>
<tr><td>Arizona</td><td>7,179,943</td><td>7,264,877</td><td>7,359,197</td></tr>
<tr><td>Arkansas</td><td>3,014,195</td><td>3,028,122</td><td>3,045,637</td></tr>
<tr><td>California</td><td>39,501,653</td><td>39,142,991</td><td>39,029,342</td></tr>
</tbody>
</table>
<p>Though not clearly labeled, we can deduce that this data presents values for three different variables: <em>State</em>, <em>Year</em>, and <em>Population</em>. </p>
<p>Looking more closely we see:</p>
<ul>
<li><em>State</em> is stored in a unique column. </li>
<li>The values of <em>Years</em> are stored as column names. </li>
<li>The values of <em>Population</em> are stored in separate columns for each year. </li>
</ul>
<p>Our variables do not each have a unique column, violating the rules of tidy data.</p>
<h3 id="example-two-long-form-state-population-table">Example Two: Long Form State Population Table</h3>
<table>
 <thead>
<tr><th>State</th><th>Year</th><th>Population </th></tr>
</thead>
<tbody>
<tr><td>Alabama</td><td>2020</td><td>5,031,362</td></tr>
<tr><td>Alabama</td><td>2021</td><td>5,049,846</td></tr>
<tr><td>Alabama</td><td>2022</td><td>5,074,296</td></tr>
<tr><td>Alaska</td><td>2020</td><td>732,923</td></tr>
<tr><td>Alaska</td><td>2021</td><td>734,182</td></tr>
<tr><td>Alaska</td><td>2022</td><td>733,583</td></tr>
<tr><td>Arizona</td><td>2020</td><td>7,179,943</td></tr>
<tr><td>Arizona</td><td>2021</td><td>7,264,877</td></tr>
<tr><td>Arizona</td><td>2022</td><td>7,359,197</td></tr>
</tbody>
</table>
<p>The transformed data above now has three columns, one for each variable <em>State</em>, <em>Year</em>, and <em>Population</em>. We can also confirm that each observation has a single row and each value has a single cell. </p>
<p>Transforming the data to long form has resulted in a tidy data table. </p>
<h3 id="why-do-we-care-about-tidy-data">Why Do We Care About Tidy Data?</h3>
<p>Working with tidy data offers a number of advantages:</p>
<ul>
<li>Tidy data storage offers consistency when trying to compare, explore, and analyze data whether it be panel data, <a href="https://www.aptech.com/blog/introduction-to-the-fundamentals-of-time-series-data-and-analysis/" target="_blank" rel="noopener">time series data</a> or cross-sectional data. </li>
<li>Using columns for variables is aligned with vectorization and <a href="https://www.aptech.com/blog/gauss-basics-3-introduction-to-matrices/" target="_blank" rel="noopener">matrix notation</a>, both of which are fundamental to efficient computations. </li>
<li>Many software tools expect tidy data and will only work reliably with tidy data. </li>
</ul>
<hr>
<div style="text-align:center">Ready to elevate your research? <a href="https://www.aptech.com/request-demo/" target="_blank" rel="noopener">Try GAUSS today.</a></div>
<hr>
<h2 id="transforming-from-wide-to-long-panel-data">Transforming From Wide to Long Panel Data</h2>
<p>In this section, we will look at how to use the GAUSS procedure <code>dfLonger</code> to transform panel data from wide to long form. This section will cover:</p>
<ul>
<li>The fundamentals of the <code>dfLonger</code> procedure.</li>
<li>A standard process for setting up panel data transformations.</li>
</ul>
<h3 id="the-dflonger-procedure">The <code>dfLonger</code> Procedure</h3>
<p>The <code>dfLonger</code> procedure transforms wide form GAUSS <a href="https://www.aptech.com/blog/what-is-a-gauss-dataframe-and-why-should-you-care/" target="_blank" rel="noopener">dataframes</a> to long form GAUSS dataframes. It has four required inputs and one <a href="https://www.aptech.com/blog/the-basics-of-optional-arguments-in-gauss-procedures/" target="_blank" rel="noopener">optional input</a>: </p>
<pre class="hljs-container hljs-container-solo"><code class="lang-gauss">df_long = dfLonger(df_wide, columns, names_to, values_to [, pctl]);</code></pre>
<hr>
<dl>
<dt>df_wide</dt>
<dd>A GAUSS dataframe in wide panel format.</dd>
<dt>columns</dt>
<dd>String array, the columns that should be used in the conversion.</dd>
<dt>names_to</dt>
<dd>String array, specifies the variable name(s) for the new column(s) created to store the wide variable names.</dd>
<dt>value_to</dt>
<dd>String, the name of the new column containing the values.</dd>
<dt>pctl</dt>
<dd>Optional, an instance of the <code>pivotControl</code> structure used for advanced pivoting options.
<hr></dd>
</dl>
<h3 id="setting-up-panel-data-transformations">Setting Up Panel Data Transformations</h3>
<p>Having a systematic process for transforming wide panel data to long panel data will:</p>
<ul>
<li>Save time.</li>
<li>Eliminate frustration.</li>
<li>Prevent errors. </li>
</ul>
<p>Let's use our wide form state population data to work through the steps.</p>
<h3 id="step-1-identify-variables">Step 1: Identify variables.</h3>
<p>In our wide form population table, there are three variables: <em>State</em>, <em>Year</em>, and <em>Population</em>. </p>
<div class="alert alert-info" role="alert">Variables are not always are clearly labeled in wide form data. You will often need to have background information to identify variables. Make sure to pay attention to references, titles, or other sources to ensure that you clearly understand the variables. </div>
<h3 id="step-2-identify-columns-to-convert">Step 2: Identify columns to convert.</h3>
<p>The easiest way to determine what columns need to be converted is to identify the &quot;problem&quot; columns in your wide form data.  </p>
<p>For example, in our original state population table, the columns named <em>2020</em>, <em>2021</em>, <em>2022</em>, represent our <em>Year</em> variable. They store the values for the <em>Population</em> variable. </p>
<p><a href="https://www.aptech.com/wp-content/uploads/2023/12/Blank-diagram-Page-1-1.jpg"><img src="https://www.aptech.com/wp-content/uploads/2023/12/Blank-diagram-Page-1-1.jpg" alt="" width="731" height="289" class="aligncenter size-full wp-image-11584149" /></a></p>
<p>These are the columns we will need to address in order to make our data tidy.</p>
<pre class="hljs-container hljs-container-solo"><code class="lang-gauss">columns = "2020"$|"2021"$|"2022";</code></pre>
<p>We only have three columns to transform and it is easy to just type out our column names in a string array. This won't always be the case, though. Fortunately, GAUSS has a lot of great convenience functions to help with creating your column lists.</p>
<p>My favorites include:</p>
<table>
 <thead>
<tr><th>Function</th><th>Description</th><th>Example</th></tr>
</thead>
<tbody>
<tr><td><a href="https://docs.aptech.com/gauss/getcolnames.html" target="_blank" rel="noopener">getColNames</a></td><td>Returns the column variable names.</td><td><code>
varnames = getColNames(df_wide)</code></td></tr>
<tr><td><a href="https://docs.aptech.com/gauss/startswith.html" target="_blank" rel="noopener">startsWith</a></td><td>Returns a 1 if a string starts with a specified pattern.</td><td><code>
mask = startsWith(colNames, pattern)</code></td></tr>
<tr><td><a href="https://docs.aptech.com/gauss/trimr.html" target="_blank" rel="noopener">trimr</a></td><td>Trims rows from the top and/or bottom of a matrix.</td><td><code>
names = trimr(full_list, top, bottom)</code></td></tr>
<tr><td><a href="https://docs.aptech.com/gauss/rowcontains.html" target="_blank" rel="noopener">rowcontains</a></td><td>Returns a 1 if the row contains the data specified by the <code>needle</code> variable, otherwise it returns a 0.</td><td><code>
mask = rowcontains(haystack, needle)</code></td></tr>
<tr><td><a href="https://docs.aptech.com/gauss/selif.html" target="_blank" rel="noopener">selif</a></td><td>Selects rows from a matrix, dataframe or string array, based upon a vector of 1’s and 0’s.</td><td><code>
names = rowcontains(full_list, mask)</code></td></tr>
</tbody>
</table>
<p>For more complex cases, it useful to approach creating column lists as a two-step process:</p>
<ol>
<li>Get all column names using <code>getColNames</code>.</li>
<li>Select a subset of columns names using a selection convenience functions. </li>
</ol>
<p>As an example, suppose our state population dataset contains a year column as the first column and the remaining columns contain the populations for 1950-2022. It would be difficult to write out the column list for all years. </p>
<p>Instead we could:</p>
<ol>
<li>Get a list of all the column names using <code>getColNames</code>.</li>
<li>Trim the first name off the list. </li>
</ol>
<pre class="hljs-container hljs-container-solo"><code class="lang-gauss">// Get all columns names
colNames = getColNames(pop_wide);

// Trim first name `year` 
// from top of the name list
colNames = trimr(colNames, 1, 0);</code></pre>
<h3 id="step-3-name-the-new-columns-for-storing-names">Step 3: Name the new columns for storing names.</h3>
<p>The names of the columns being transformed from our wide form data will be stored in a variable specified by the input <em>names_to</em>. </p>
<p>In this case, we want to store the names from the wide data in one new variable called, <code>"Years"</code>. In later examples, we will look at how to split names into multiple variables using prefixes, separators, or patterns.</p>
<pre class="hljs-container hljs-container-solo"><code class="lang-gauss">names_to = "Years";</code></pre>
<h3 id="step-4-name-the-new-columns-for-storing-values">Step 4: Name the new columns for storing values.</h3>
<p>The values stored in the columns being transformed will be stored in a variable specified by the input <em>values_to</em>.</p>
<p>For our population table, we will store the values in a variable named <code>"Population"</code>.</p>
<pre class="hljs-container hljs-container-solo"><code class="lang-gauss">values_to = "Population";</code></pre>
<h2 id="basic-pivoting">Basic Pivoting</h2>
<p>Now it's time to put all these steps together into a working example. Let's continue with our state population example. </p>
<p>We'll start by loading the <a href="https://github.com/aptech/gauss_blog/blob/master/econometrics/pivoting-to-long-form-12-6-23/state_pop.gdat" target="_blank" rel="noopener">complete state population dataset</a> from the <em>state_pop.gdat</em> file:</p>
<pre class="hljs-container hljs-container-solo"><code class="lang-gauss">// Load data 
pop_wide = loadd("state_pop.gdat");

// Preview data
head(pop_wide);</code></pre>
<pre>           State             2020             2021             2022
         Alabama        5031362.0        5049846.0        5074296.0
          Alaska        732923.00        734182.00        733583.00
         Arizona        7179943.0        7264877.0        7359197.0
        Arkansas        3014195.0        3028122.0        3045637.0
      California        39501653.        39142991.        39029342. </pre>
<p>Now, let's set up our information for transforming our data:</p>
<pre class="hljs-container hljs-container-solo"><code class="lang-gauss">// Identify columns
columns = "2020"$|"2021"$|"2022";

// Variable for storing names
names_to = "Year";

// Variable for storing values
values_to = "Population";</code></pre>
<p>Finally, we'll transform our data using <code>df_longer</code>:</p>
<pre class="hljs-container hljs-container-solo"><code class="lang-gauss">// Convert data using df_longer
pop_long = dfLonger(pop_wide, columns, names_to, values_to);

// Preview data
head(pop_long);</code></pre>
<pre>           State             Year       Population
         Alabama             2020        5031362.0
         Alabama             2021        5049846.0
         Alabama             2022        5074296.0
          Alaska             2020        732923.00
          Alaska             2021        734182.00 </pre>
<h2 id="advanced-pivoting">Advanced Pivoting</h2>
<p>One of the most appealing things about <code>dfLonger</code> is that while simple to use, it offers tools for tackling the most complex cases. In this section, we'll cover everything you need to know for moving beyond basic pivoting.</p>
<h3 id="the-pivotcontrol-structure">The <code>pivotControl</code> Structure</h3>
<p>The <code>pivotControl</code> structure allows you to control pivoting specifications using
the following members:</p>
<table>
 <thead>
<tr><th>Member</th><th>Purpose</th></tr>
</thead>
<tbody>
<tr><td>names_prefix</td><td>A string input which specifies which characters, if any, should be stripped from the front of the wide variable names before they are assigned to a long column.</td></tr>
<tr><td>names_sep_split</td><td>A string input which specifies which characters, if any, mark where the <i>names_to</i> names should be broken up.</td></tr>
<tr><td>names_pattern_split</td><td>A string input containing a regular expression specifying group(s) in <i>names_to</i> names which should be broken up.</td></tr>
<tr><td>names_types</td><td>A string input specifying data types for the <i>names_to</i> variable.</td></tr>
<tr><td>values_drop_missing</td><td>Scalar, is set to 1 all rows with missing values will be removed. 
</td></tr>
</tbody>
</table>
<div class="alert alert-info" role="alert">We will demonstrate more how to use the <code>pivotControl</code> structure in later examples. However, if you are unfamiliar with structures you may find it useful to review our tutorial, <a href="https://www.aptech.com/resources/tutorials/a-gentle-introduction-to-using-structures/" target="_blank" rel="noopener">&quot;A Gentle Introduction to Using Structures.&quot;</a></div>
<h3 id="changing-variable-types">Changing Variable Types</h3>
<p>By default the variables created from the pieces of the variable names will be <a href="https://www.aptech.com/blog/easy-management-of-categorical-variables/" target="_blank" rel="noopener">categorical variables</a>. </p>
<p>If we examine the variable type of <em>pop_long</em> from our previous example, </p>
<pre class="hljs-container hljs-container-solo"><code class="lang-gauss">// Check the type of the 'Year' variables
getColTypes(pop_long[., "Year"]);</code></pre>
<p>we can see that the <em>Year</em> variable is a categorical variable:</p>
<pre>            type
        category </pre>
<p>This isn't ideal and we'd prefer our <em>Year</em> variable to be a <a href="https://www.aptech.com/blog/dates-and-times-made-easy/" target="_blank" rel="noopener">date</a>.
We can control the assigned type using the <em>names_types</em> member of the <code>pivotControl</code> structure. The <em>names_types</em> member can be specified in one of two ways:</p>
<ol>
<li>As a column vector of types for each of the <em>names_to</em> variables.</li>
<li>An <em>n x 2</em> string array where the first column is the name of the variable(s) and the second column contains the type(s) to be assigned. </li>
</ol>
<p>For our example, we wish to specify that the <em>Year</em> variable should be a date but we don't need to change any of the other assigned types, so we will use the second option:</p>
<pre class="hljs-container hljs-container-solo"><code class="lang-gauss">// Declare pivotControl structure and fill with default values
struct pivotControl pctl;
pctl = pivotControlCreate();

// Specify that 'Year' should be
// converted to a date variable
pctl.names_types = {"Year" "date"};</code></pre>
<p>Next, we complete the steps for pivoting:</p>
<pre class="hljs-container hljs-container-solo"><code class="lang-gauss">// Get all column names and remove the first column, 'State'
columns = getColNames(pop_wide);
columns = trimr(columns, 1, 0);

// Variable for storing names
names_to = "Year";

// Variable for storing values
values_to = "Population";</code></pre>
<p>Finally, we call <code>dfLonger</code> including the <code>pivotControl</code> structure, <em>pctl</em>, as the final input:</p>
<pre class="hljs-container hljs-container-solo"><code class="lang-gauss">// Call dfLonger with optional control structure
pop_long = dfLonger(pop_wide, columns, names_to, values_to, pctl);

// Preview data
head(pop_long);</code></pre>
<pre>           State             Year       Population
         Alabama             2020        5031362.0
         Alabama             2021        5049846.0
         Alabama             2022        5074296.0
          Alaska             2020        732923.00
          Alaska             2021        734182.00</pre>
<p>Now if we check the type of our <em>Year</em> variable:</p>
<pre class="hljs-container hljs-container-solo"><code class="lang-gauss">// Check the type of 'Year'
getColTypes(pop_long[., "Year"]);</code></pre>
<p>It is a date variable:</p>
<pre>  type
  date</pre>
<h3 id="stripping-prefixes">Stripping Prefixes</h3>
<p>In our previous example, the wide data names only contained the year. However, the column names of a wide dataset often have common prefixes. The <em>names_prefix</em> member of the <code>pivotControl</code> structure offers a convenient way to strip unwanted prefixes. </p>
<p>Suppose that our wide form state population columns were labeled <code>"yr_2020"</code>, <code>"yr_2021"</code>, <code>"yr_2022"</code>:</p>
<pre class="hljs-container hljs-container-solo"><code class="lang-gauss">// Load data
pop_wide2 = loadd("state_pop2.gdat");

// Preview data
head(pop_wide2);</code></pre>
<pre>           State          yr_2020          yr_2021          yr_2022
         Alabama        5031362.0        5049846.0        5074296.0
          Alaska        732923.00        734182.00        733583.00
         Arizona        7179943.0        7264877.0        7359197.0
        Arkansas        3014195.0        3028122.0        3045637.0
      California        39501653.        39142991.        39029342.</pre>
<p>We need to strip these prefixes when transforming our data to long form. </p>
<p>To accomplish this we first need to specify that our name columns have the common prefix <code>"yr"</code>:</p>
<pre class="hljs-container hljs-container-solo"><code class="lang-gauss">// Declare pivotControl structure and fill with default values
struct pivotControl pctl;
pctl = pivotControlCreate();

// Specify prefix
pctl.names_prefix = "yr_";</code></pre>
<p>Next, we complete the steps for pivoting:</p>
<pre class="hljs-container hljs-container-solo"><code class="lang-gauss">// Get all column names and remove the first column, 'State'
columns = getColNames(pop_wide2);
columns = trimr(columns, 1, 0);

// Variable for storing names
names_to = "Year";

// Variable for storing values
values_to = "Population";</code></pre>
<p>Finally, we call <code>dfLonger</code>:</p>
<pre class="hljs-container hljs-container-solo"><code class="lang-gauss">// Call dfLonger with optional control structure
pop_long = dfLonger(pop_wide2, columns, names_to, values_to, pctl);

// Preview data
head(pop_long);</code></pre>
<pre>           State             Year       Population
         Alabama             2020        5031362.0
         Alabama             2021        5049846.0
         Alabama             2022        5074296.0
          Alaska             2020        732923.00
          Alaska             2021        734182.00</pre>
<h3 id="splitting-names">Splitting Names</h3>
<p>In our basic example the only information contained in the names columns was the year. We created one variable to store that information, <code>"Year"</code>. However, we may have cases where our wide form data contains more than one piece of information. </p>
<p>In theses case there are two important steps to take:</p>
<ol>
<li>Name the variables that will store the information contained in the wide data column names using the <em>names_to</em> input.</li>
<li>Indicate to GAUSS how to split the wide data column names into the <em>names_to</em> variables. </li>
</ol>
<h4 id="names-include-a-separator">Names Include a Separator</h4>
<p>One way that names in wide data can contain multiple pieces of information is through the use of separators. </p>
<p>For example, suppose our data looks like this:</p>
<pre>           State       urban_2020       urban_2021       urban_2022       rural_2020       rural_2021       rural_2022
         Alabama        6558153.0        4972982.0        12375977.        1526791.0        76863.000        7301681.0
          Alaska        21944.000        467051.00        311873.00        710978.00        267130.00        421709.00
         Arizona        1248007.0        6033358.0        1444029.0        8427950.0        1231518.0        5915167.0
        Arkansas        863918.00        913266.00        7000024.0        2150276.0        3941388.0        3954387.0
      California        17255657.        27682794.        63926200.        22245995.        11460196.        24896858. </pre>
<p>Now our names specify:</p>
<ul>
<li>Whether the population is the urban or rural population. </li>
<li>The year of the observation.</li>
</ul>
<p>In this case, we:</p>
<ul>
<li>Use the <em>names_sep_split</em> member of the <code>pivotControl</code> structure to indicate how to split the names. </li>
<li>Specify a <em>names_to</em> variable for each group created by the separator.</li>
</ul>
<pre class="hljs-container hljs-container-solo"><code class="lang-gauss">// Load data
pop_wide3 = loadd("state_pop3.gdat");

// Declare pivotControl structure and fill with default values
struct pivotControl pctl;
pctl = pivotControlCreate();

// Specify how to separate names
pctl.names_sep_split = "_";

// Specify two variables for holding
// names information:
//    'Location' for the information before the separator
//    'Year' for the information after the separator
names_to = "Location"$|"Year";

// Variable for storing values
values_to = "Population";

// Call dfLonger with optional control structure
pop_long = dfLonger(pop_wide3, columns, names_to, values_to, pctl);

// Preview data
head(pop_long);</code></pre>
<pre>           State         Location             Year       Population
         Alabama            urban             2020        6558153.0
         Alabama            urban             2021        4972982.0
         Alabama            urban             2022        12375977.
         Alabama            rural             2020        1526791.0
         Alabama            rural             2021        76863.000</pre>
<p>Now, the <em>pop_long</em> dataframe contains:</p>
<ul>
<li>The information in the wide form names found before the separator, <code>"_"</code>, (urban or rural) in the <em>Location</em> variable. </li>
<li>The information in the wide form names found after the separator, <code>"_"</code>, in the <em>Year</em> variable. </li>
</ul>
<h4 id="variable-names-with-regular-expressions">Variable Names With Regular Expressions</h4>
<p>In our example above, the variables contained in the names were clearly separated by a <code>"_"</code>. However, this isn't always the case. Sometimes names use a pattern rather than separator:</p>
<pre class="hljs-container hljs-container-solo"><code class="lang-gauss">// Load data
pop_wide4 = loadd("state_pop4.gdat");

// Preview data
head(pop_wide4);</code></pre>
<pre>           State        urban2020        urban2021        urban2022        rural2020        rural2021        rural2022
         Alabama        6558153.0        4972982.0        12375977.        1526791.0        76863.000        7301681.0
          Alaska        21944.000        467051.00        311873.00        710978.00        267130.00        421709.00
         Arizona        1248007.0        6033358.0        1444029.0        8427950.0        1231518.0        5915167.0
        Arkansas        863918.00        913266.00        7000024.0        2150276.0        3941388.0        3954387.0
      California        17255657.        27682794.        63926200.        22245995.        11460196.        24896858. </pre>
<p>In cases like this, we can use the <em>names_pattern_split</em> member to tell GAUSS we want to pass in a regular expression that will split the columns. We can't cover the full details of regular expressions here. However, there are a few fundamentals that will help us get started with this example. </p>
<p>In regEx:</p>
<ol>
<li>Each statement inside a pair of parentheses is a group. </li>
<li>To match any upper or lower case letter we use <code>"[a-zA-Z]"</code>. More specifically, this tells GAUSS that we want to match any lowercase letter ranging from a-z and any upper case letter ranging from A-Z. If we wanted to limit this to any lowercase letters from t to z and any uppercase letter B to M we would say <code>"[t-zB-M]"</code>.</li>
<li>To match any integer we use <code>"[0-9]"</code>.</li>
<li>To represent that we want to match <u>one or more</u> instances of a pattern we use <code>"+"</code>.</li>
<li>To represent that we want to match <u>zero or more</u> instances of a pattern we use <code>"*"</code>.</li>
</ol>
<p>In this case, we want to separate our names so that &quot;urban&quot; and &quot;rural&quot; are collected in <em>Location</em> and <em>2020</em>, <em>2021</em>, and <em>2022</em> are collected in the <em>Year</em> variable:</p>
<ol>
<li>We have two groups.</li>
<li>We can capture both <code>urban</code> and <code>rural</code> using <code>"[a-zA-Z]+"</code>.</li>
<li>We can capture the years by matching one or more number using <code>"[0-9]+"</code>.</li>
</ol>
<p>Let's use regEx to specify our <em>names_pattern_split</em> member:</p>
<pre class="hljs-container hljs-container-solo"><code class="lang-gauss">// Declare pivotControl structure and fill with default values
struct pivotControl pctl;
pctl = pivotControlCreate();

// Specify how to separate names 
// using the pivotControl structure
pctl.names_pattern_split = "([a-zA-Z]+)([0-9]+)"; </code></pre>
<p>Next, we can put this together with our other steps to transform our wide data:</p>
<pre class="hljs-container hljs-container-solo"><code class="lang-gauss">// Variable for storing names
names_to = "Location"$|"Year";

// Get all column names and remove the first column, 'State'
columns = getColNames(pop_wide4);
columns = trimr(columns, 1, 0);

// Variable for storing values
values_to = "Population";

// Call dfLonger with optional control structure
pop_long = dfLonger(pop_wide4, columns, names_to, values_to, pctl4);
head(pop_long);</code></pre>
<pre>           State         Location             Year       Population
         Alabama            urban             2020        6558153.0
         Alabama            urban             2021        4972982.0
         Alabama            urban             2022        12375977.
         Alabama            rural             2020        1526791.0
         Alabama            rural             2021        76863.000</pre>
<h3 id="multiple-value-variables">Multiple Value Variables</h3>
<p>In all our previous examples we had values that needed to be stored in one variable. However, it's more realistic that our dataset contains multiple groups of values and we will need to specify multiple variables to store these values. </p>
<p>Let's consider our previous example which used the <em>pop_wide4</em> dataset:</p>
<pre>           State        urban2020        urban2021        urban2022        rural2020        rural2021        rural2022
         Alabama        6558153.0        4972982.0        12375977.        1526791.0        76863.000        7301681.0
          Alaska        21944.000        467051.00        311873.00        710978.00        267130.00        421709.00
         Arizona        1248007.0        6033358.0        1444029.0        8427950.0        1231518.0        5915167.0
        Arkansas        863918.00        913266.00        7000024.0        2150276.0        3941388.0        3954387.0
      California        17255657.        27682794.        63926200.        22245995.        11460196.        24896858. </pre>
<p>Suppose that rather than creating a <em>location</em> variable, we wish to separate the population information into two variables, <em>urban</em> and <em>rural</em>. To do this we will:</p>
<ol>
<li>Split the variable names by words (<code>"urban"</code> or <code>"rural"</code>) and integers.</li>
<li>Create a <em>Year</em> column from the integer portions of the names.</li>
<li>Create two values columns, <em>urban</em> and <em>rural</em>, from the word portions. </li>
</ol>
<p>First, we will specify our columns:</p>
<pre class="hljs-container hljs-container-solo"><code class="lang-gauss">// Get all column names and remove the first column, 'State'
columns = getColNames(pop_wide4);
columns = trimr(columns, 1, 0);</code></pre>
<div class="alert alert-info" role="alert">Since we are using the same data as our previous example, we don't need to load any additional data.</div>
<p>Next, we need to specify our <em>names_to</em> and <em>values_to</em> inputs. However, this time we want our <em>values_to</em> variables to be determined by the information in our names. </p>
<p>We do this using <code>".value"</code>.</p>
<pre class="hljs-container hljs-container-solo"><code class="lang-gauss">// Tell GAUSS to use the first group of the split names 
// to set the values variables and 
// store the remaining group in 'Year'
names_to = ".value" $| "Year";

// Tell GAUSS to get 'values_to' variables from 'names_to'
values_to = "";</code></pre>
<p>Setting <code>".value"</code> as the first element in our <em>names_to</em> input tells <code>dfLonger</code> to take the first piece of the wide data names and create a column with the all the values from all matching columns.</p>
<p>In other words, combine all the values from the variables <em>urban2020</em>, <em>urban2021</em>, <em>urban2022</em> into a single variable named <em>urban</em> and do the same for the <em>rural</em> columns.</p>
<p>Finally, we need to tell GAUSS how to split the variable names. </p>
<pre class="hljs-container hljs-container-solo"><code class="lang-gauss">// Declare 'pctl' to be a pivotControl structure
// and fill with default settings
struct pivotControl pctl;
pctl = pivotControlCreate();

// Set the regex to split the variable names
pctl.names_pattern_split = "(urban|rural)([0-9]+)";</code></pre>
<p>This time, we specify the variable names, <code>"(urban|rural)"</code> rather than use the general specifier <code>"([a-zA-Z])"</code>.</p>
<p>Now we call <code>dfLonger</code>:</p>
<pre class="hljs-container hljs-container-solo"><code class="lang-gauss">// Convert the dataframe to long format according to our specifications
pop_long = dfLonger(pop_wide4, columns, names_to, values_to, pctl);

// Print the first 5 rows of the long form dataframe
head(pop_long);</code></pre>
<pre>           State             Year            urban            rural
         Alabama             2020        6558153.0        1526791.0
         Alabama             2021        4972982.0        76863.000
         Alabama             2022        12375977.        7301681.0
          Alaska             2020        21944.000        710978.00
          Alaska             2021        467051.00        267130.00</pre>
<p>Now the urban population and rural population are stored in their own column, named <em>urban</em> and <em>rural</em>. </p>
<div class="alert alert-info" role="alert">These names can easily be changed using the <b>Data Manager</b> or <a href="https://docs.aptech.com/gauss/setcolnames.html" target="_blank" rel="noopener">setColNames</a></div>
<h2 id="conclusion">Conclusion</h2>
<p>As we've seen today, pivoting panel data from wide to long can be complicated. However, using a systematic approach and the GAUSS <code>dfLonger</code> procedure help to alleviate the frustration, time, and errors.   </p>
<h3 id="further-reading">Further Reading</h3>
<ol>
<li><a href="https://www.aptech.com/blog/panel-data-structural-breaks-and-unit-root-testing/" target="_blank" rel="noopener">Panel data, structural breaks and unit root testing</a></li>
<li><a href="https://www.aptech.com/blog/panel-data-basics-one-way-individual-effects/" target="_blank" rel="noopener">Panel Data Basics: One-way Individual Effects</a></li>
<li><a href="https://www.aptech.com/blog/how-to-aggregate-panel-data-in-gauss/" target="_blank" rel="noopener">How to Aggregate Panel Data in GAUSS</a></li>
<li><a href="https://www.aptech.com/blog/introduction-to-the-fundamentals-of-panel-data/" target="_blank" rel="noopener">Introduction to the Fundamentals of Panel Data</a></li>
<li><a href="https://www.aptech.com/blog/panel-data-stationarity-test-with-structural-breaks/" target="_blank" rel="noopener">Panel Data Stationarity Test With Structural Breaks</a></li>
<li><a href="https://www.aptech.com/blog/get-started-with-panel-data-in-gauss-video/" target="_blank" rel="noopener">Getting Started With Panel Data in GAUSS </a></li>
</ol>
<div style="text-align:center;background-color:#455560;color:#FFFFFF">
<hr>
<h3 id="discover-how-gauss-24-can-help-you-reach-your-goals">Discover how GAUSS 24 can help you reach your goals.</h3>
 
<div class="lp-cta">
    <a href="https://www.aptech.com/request-demo" class="btn btn-primary">Request Demo</a>
    <a href="https://www.aptech.com/request-quote/" class="btn btn-primary btn-quote">Request pricing</a>
</div><hr>
</div>]]></content:encoded>
					
					<wfw:commentRss>https://www.aptech.com/blog/transforming-panel-data-to-long-form-in-gauss/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
		<item>
		<title>Managing String Data with GAUSS Dataframes</title>
		<link>https://www.aptech.com/blog/managing-string-data-with-gauss-dataframes/</link>
					<comments>https://www.aptech.com/blog/managing-string-data-with-gauss-dataframes/#respond</comments>
		
		<dc:creator><![CDATA[Eric]]></dc:creator>
		<pubDate>Tue, 28 Mar 2023 20:41:19 +0000</pubDate>
				<category><![CDATA[Programming]]></category>
		<guid isPermaLink="false">https://www.aptech.com/?p=11583522</guid>

					<description><![CDATA[Working with strings hasn’t always been easy in GAUSS. In the past, the only option in GAUSS was to store strings separately from numeric data. It made it difficult to work with datasets that contained mixed types. 

With the introduction of GAUSS dataframes in GAUSS 21 and the enhanced string capabilities of <a href="https://www.aptech.com/blog/gauss23/" target="_blank" rel="noopener">GAUSS 23</a>, that has all changed! I would argue that GAUSS now offers one of the best environments for managing and cleaning mixed-type data. 

I recently used GAUSS to perform the very practical task of creating an email list from a string-heavy dataset – something I never would have chosen GAUSS for in the past. In this blog, we walk through this data cleaning task, highlighting several key features for handling strings. ]]></description>
										<content:encoded><![CDATA[<h3 id="introduction">Introduction</h3>
<p>Working with strings hasn’t always been easy in GAUSS. In the past, the only option in GAUSS was to store strings separately from numeric data. It made it difficult to work with datasets that contained mixed types. </p>
<p>With the introduction of GAUSS dataframes in GAUSS 21 and the enhanced string capabilities of <a href="https://www.aptech.com/blog/gauss23/" target="_blank" rel="noopener">GAUSS 23</a>, that has all changed! I would argue that GAUSS now offers one of the best environments for managing and cleaning mixed-type data. </p>
<p>I recently used GAUSS to perform the very practical task of creating an email list from a string-heavy dataset – something I never would have chosen GAUSS for in the past. In this blog, we walk through this data cleaning task, highlighting several key features for handling strings. </p>
<h2 id="quick-overview-of-strings-in-gauss">Quick Overview of Strings in GAUSS</h2>
<p>The <a href="https://www.aptech.com/blog/what-is-a-gauss-dataframe-and-why-should-you-care/" target="_blank" rel="noopener">GAUSS dataframe</a> revolutionized data storage in GAUSS. It allows you to store mixed data types together including numbers, <a href="https://www.aptech.com/blog/dates-and-times-made-easy/" target="_blank" rel="noopener">dates</a>, <a href="https://www.aptech.com/blog/easy-management-of-categorical-variables/" target="_blank" rel="noopener">categorical data</a>, and strings. </p>
<p>The GAUSS string data type can contain letters, numbers, and other characters. The string data type:</p>
<ul>
<li>Keeps labels with data.</li>
<li>Saves additional loading steps.</li>
<li>Makes data and reports easier to understand.</li>
</ul>
<p>It isn’t difficult to see the usefulness of this in real-world data which often includes information such as customer names, product names, or locations.</p>
<h3 id="loading-strings-in-gauss">Loading Strings in GAUSS</h3>
<p>Strings can be programmatically loaded from multiple data file types using <a href="https://docs.aptech.com/gauss/loadd.html" target="_blank" rel="noopener"><code>loadd</code></a>. No special steps are required, and GAUSS automatically detects strings in XLSX, CSV, STATA, SAS, and GDAT files.</p>
<p>In addition, the <strong>Data Import</strong> window provides a great tool for <a href="https://docs.aptech.com/gauss/data-management/interactive-import.html" target="_blank" rel="noopener">interactively previewing and managing data</a> of all types at the time of import. </p>
<div class="alert alert-info" role="alert">See our <a href="https://docs.aptech.com/gauss/data-management.html" target="_blank" rel="noopener">Data Management and Cleaning User Guide</a> for an in-depth guide to data handling in GAUSS.</div>
<h2 id="data-exercise-building-an-email-list">Data Exercise: Building an Email List</h2>
<p>To help demonstrate GAUSS's string capabilities, we will build and export an email contact list from a provided Excel dataset. We will break this project into several smaller tasks:</p>
<ul>
<li>Loading our raw data.</li>
<li>Generating email addresses from the provided information.</li>
<li>Combining the desired contact list information into a dataframe.</li>
<li>Exporting the dataframe as a CSV file.</li>
</ul>
<h3 id="provided-data">Provided Data</h3>
<p>We will use a sample dataset containing sales territory information for sales representatives. The original dataset includes a mix of string and categorical data including:</p>
<table>
 <thead>
 <tr>
      <th>Variable</th><th>Description</th><th>Type</th>
   </tr>
</thead>
<tbody>
<tr><td>KAR</td><td>The assigned territory sales representative.</td><td>Category</td></tr>
<tr><td>Store</td><td>The store number.</td><td>Numeric</td></tr>
<tr><td>Store name</td><td>The store name.</td><td>String</td></tr>
<tr><td>Format</td><td>Type of display found in store.</td><td>Category</td></tr>
<tr><td>Vet</td><td>Y/N indicator of in-store vet clinics.</td><td>Category</td></tr>
<tr><td>Nielsen Market</td><td>Assigned Nielsen Market</td><td>Category</td></tr>
</tbody>
</table>
<p>You can download the original dataset <a href="https://github.com/aptech/gauss_blog/blob/master/programming/working-with-strings-3.06.23/territory-info.xlsx?raw=true" target="_blank" rel="noopener">here</a>.</p>
<h3 id="importing-raw-data-interactively">Importing Raw Data Interactively</h3>
<p>For this exercise, I'm going to use the interactive <strong>Data Window</strong> to load my data. For <a href="https://www.aptech.com/blog/preparing-and-cleaning-data-fred-data-in-gauss/" target="_blank" rel="noopener">data cleaning projects</a> like this, I often find it helpful to have a preview of my raw data. This allows me to make preliminary observations about my raw data such as:</p>
<ul>
<li>The presence of unnecessary variables. </li>
<li>If the dataset has a non-standard header.</li>
<li>Data types. </li>
</ul>
<p><a href="https://www.aptech.com/wp-content/uploads/2023/03/import-data.png"><img src="https://www.aptech.com/wp-content/uploads/2023/03/import-data.png" alt="Using the GAUSS Data Window to import data. " width="697" height="495" class="aligncenter size-full wp-image-11583525" /></a></p>
<p>It's useful to note that the GAUSS <strong>Data Window</strong> always generates GAUSS code that can be used for replicating <a href="https://docs.aptech.com/gauss/data-management/programmatic-import.html" target="_blank" rel="noopener">data loading programmatically</a> in the future.</p>
<pre class="hljs-container hljs-container-solo"><code class="lang-gauss">territory_info = loadd("C:/business/accounts/territory-info.xlsx");</code></pre>
<p>Notice that this data will load directly as we saw it in the preview. </p>
<pre class="hljs-container hljs-container-solo"><code class="lang-gauss">// Print the first 5 rows
head(territory_info);</code></pre>
<p>It will also look exactly like our preview if we print it to screen:</p>
<pre>             KAR            Store       Store Name           Format       Vet   Nielsen Market
   Larry McGuire        725.00000     NY-MIDDLETWN              RUN         N     New York, NY
   Larry McGuire        728.00000        STRATFORD      PREMIUM RUN         N     New York, NY
   Larry McGuire        752.00000       NORWALK-CT           PANTRY         N     New York, NY
   Larry McGuire        758.00000       SEEKONK-MA           PANTRY         N Providence et al
   Larry McGuire        762.00000   SOUTHINGTON-CT   4 FT. MINI RUN         N Hartford and New</pre>
<h3 id="cleaning-our-data">Cleaning Our Data</h3>
<p>Before generating our email list, we should perform some preliminary data cleaning. </p>
<div class="alert alert-info" role="alert">Though we will conduct our data cleaning programmatically, it is worth noting that the <b>Data Management</b> pane offers an interactive environment for data cleaning. For more information on interactive data cleaning, see our <a href="https://docs.aptech.com/gauss/data-management/data-cleaning.html?highlight=interactive#" target="_blank" rel="noopener">Data Cleaning User Guide</a>.</div>
<p>First, we check for duplicates:</p>
<pre class="hljs-container hljs-container-solo"><code class="lang-gauss">// Check for duplicates
getduplicates(territory_info);</code></pre>
<p>Since no output is printed, it shows us that there are not any duplicate rows. </p>
<p>Next, let's review the <code>Nielsen Market</code> variable using the <a href="https://docs.aptech.com/gauss/frequency.html" target="_blank" rel="noopener"><code>frequency</code></a> command:</p>
<pre class="hljs-container hljs-container-solo"><code class="lang-gauss">// Check Frequencies
frequency(territory_info, "Nielsen Market");</code></pre>
<p>The frequencies aren't that interesting to us. However, the report provides us with a quick view of the categories:</p>
<div style="height:400px;line-height:3em;overflow:auto;padding:5px;">
<pre>
                      Label      Count   Total %    Cum. % 
              21 iowa-idaho          1   0.09814   0.09814 
     Abilene-Sweetwater, TX          1   0.09814    0.1963 
    Abilene-Sweetwater, TX           1   0.09814    0.2944 
          Albany et al, NY           2    0.1963    0.4907 
  Albuquerque-Santa Fe, NM           3    0.2944    0.7851 
               Atlanta, GA          20     1.963     2.748 
      Augusta-Aiken, GA-SC           1   0.09814     2.846 
                Austin, TX          13     1.276     4.122 
            Bakersfield, CA          2    0.1963     4.318 
             Baltimore, MD          19     1.865     6.183 
  Beaumont-Port Arthur, TX           2    0.1963     6.379 
                  Bend, OR           2    0.1963     6.575 
            Binghamton, NY           1   0.09814     6.673 
       Boston et al, MA-NH          37     3.631      10.3 
               Buffalo, NY           3    0.2944      10.6 
         Butte-Bozeman, MT           2    0.1963     10.79 
            Charleston, SC           1   0.09814     10.89 
             Charlotte, NC           7    0.6869     11.58 
       Charlottesville, VA           1   0.09814     11.68 
      Cheyenne et al, WY-NE          1   0.09814     11.78 
               Chicago, IL          45     4.416     16.19 
         Chico-Redding, CA           3    0.2944     16.49 
            Cincinnati, OH           1   0.09814     16.58 
       Cleveland et al, OH          11     1.079     17.66 
   Colorado Sprgs et al, CO          6    0.5888     18.25 
              Columbia, SC           2    0.1963     18.45 
              Columbus, OH           2    0.1963     18.65 
        Corpus Christi, TX           2    0.1963     18.84 
       Dallas-Ft. Worth, TX         35     3.435     22.28 
      Dallas-Ft. Worth, TX           2    0.1963     22.47 
    Davenport et al, IA-IL           1   0.09814     22.57 
                Dayton, OH           3    0.2944     22.87 
                 Denver, CO         27      2.65     25.52 
       Des Moines-Ames, IA           1   0.09814     25.61 
               Detroit, MI          12     1.178     26.79 
      El Paso et al, TX-NM           3    0.2944     27.09 
          Elmira et al, NY           1   0.09814     27.18 
                Eugene, OR           2    0.1963     27.38 
                Eureka, CA           1   0.09814     27.48 
     Fargo-Valley City, ND           1   0.09814     27.58 
        Fresno-Visalia, CA           5    0.4907     28.07 
      Ft. Myers-Naples, FL           5    0.4907     28.56 
       Ft. Smith et al, AR           2    0.1963     28.75 
           Gainesville, FL           1   0.09814     28.85 
   Grand Junction et al, CO          1   0.09814     28.95 
      Greensboro et al, NC           2    0.1963     29.15 
      Greenville et al, NC           2    0.1963     29.34 
   Greenville et al, SC-NC           6    0.5888     29.93 
       Harlingen et al, TX           2    0.1963     30.13 
      Harrisburg et al, PA           5    0.4907     30.62 
          Harrisonburg, VA           2    0.1963     30.81 
Hartford and New Haven, CT          17     1.668     32.48 
               Houston, TX          38     3.729     36.21 
          Indianapolis, IN           6    0.5888      36.8 
          Jacksonville, FL           5    0.4907     37.29 
       Johnstown et al, PA           4    0.3925     37.68 
        Kansas City, MO-KS           1   0.09814     37.78 
             Knoxville, TN           1   0.09814     37.88 
             Lafayette, LA           2    0.1963     38.08 
          Lake Charles, LA           1   0.09814     38.17 
             Las Vegas, NV           9    0.8832     39.06 
              Lexington, KY          1   0.09814     39.16 
         Lincoln et al, NE           2    0.1963     39.35 
           Los Angeles, CA          92     9.028     48.38 
         Medford et al, OR           2    0.1963     48.58 
  Miami-Ft. Lauderdale, FL          14     1.374     49.95 
             Milwaukee, WI           7    0.6869     50.64 
  Minneapolis-St. Paul, MN          15     1.472     52.11 
           Minot et al, ND           2    0.1963     52.31 
              Missoula, MT           2    0.1963      52.5 
       Mobile et al, AL-FL           1   0.09814      52.6 
    Myrtle Beach et al, SC           3    0.2944     52.89 
             Nashville, TN          10    0.9814     53.88 
           New Orleans, LA           3    0.2944     54.17 
              New York, NY          80     7.851     62.02 
         Norfolk et al, VA           7    0.6869     62.71 
         Odessa-Midland, TX          1   0.09814     62.81 
         Oklahoma City, OK           6    0.5888      63.4 
                 Omaha, NE           1   0.09814     63.49 
         Orlando et al, FL          18     1.766     65.26 
          Palm Springs, CA           4    0.3925     65.65 
    Peoria-Bloomington, IL           3    0.2944     65.95 
          Philadelphia, PA          35     3.435     69.38 
         Phoenix et al, AZ          21     2.061     71.44 
            Pittsburgh, PA          13     1.276     72.72 
              Portland, OR          25     2.453     75.17 
       Portland-Auburn, ME           2    0.1963     75.37 
   Providence et al, RI-MA           9    0.8832     76.25 
    Quincy et al, IL-MO-IA           1   0.09814     76.35 
         Raleigh et al, NC           6    0.5888     76.94 
            Rapid City, SD           1   0.09814     77.04 
                   Reno, NV          3    0.2944     77.33 
   Richmond-Petersburg, VA           5    0.4907     77.82 
     Roanoke-Lynchburg, VA           2    0.1963     78.02 
             Rochester, NY           4    0.3925     78.41 
              Rockford, IL           1   0.09814     78.51 
      Sacramento et al, CA           6    0.5888      79.1 
             Salisbury, MD           2    0.1963     79.29 
        Salt Lake City, UT          19     1.865     81.16 
           San Antonio, TX          13     1.276     82.43 
             San Diego, CA          28     2.748     85.18 
   Santa Barbara et al, CA           5    0.4907     85.67 
              Savannah, GA           1   0.09814     85.77 
         Seattle-Tacoma, WA          1   0.09814     85.87 
        Seattle-Tacoma, WA          38     3.729      89.6 
        Sherman-Ada, TX-OK           2    0.1963     89.79 
     Sioux Falls et al, SD           1   0.09814     89.89 
    South Bend-Elkhart, IN           1   0.09814     89.99 
                Spokane- wa          1   0.09814     90.09 
   Springfield-Holyoke, MA           1   0.09814     90.19 
             St. Louis, MO          11     1.079     91.27 
              Syracuse, NY           2    0.1963     91.46 
  Tallahassee et al, FL-GA           1   0.09814     91.56 
           Tampa et al, FL          16      1.57     93.13 
                Toledo, OH           1   0.09814     93.23 
   Tucson(Sierra Vista), AZ          8    0.7851     94.01 
                 Tulsa, OK           1   0.09814     94.11 
   Tyler-Longview et al, TX          2    0.1963     94.31 
                 Utica, NY           1   0.09814     94.41 
   W. Palm Beach et al, FL           8    0.7851     95.19 
     Waco-Temple-Bryan, TX           4    0.3925     95.58 
   Washington et al, DC-MD          37     3.631     99.21 
             Watertown, NY           1   0.09814     99.31 
  Wichita Fls et al, TX-OK           2    0.1963     99.51 
          Yakima et al, WA           3    0.2944      99.8 
                spokane- wa          2    0.1963       100 
                      Total       1019       100     
</pre>
</div>
<p>From this report we can identify a few issues that need addressing:</p>
<ul>
<li>The <code>Spokane-WA</code> market is entered twice, once as <code>spokane-wa</code> and once as <code>Spokane, WA</code>.</li>
<li>The format of the <code>Spokane-WA</code> market differs from the other entries. It uses a dash rather than a comma to separate the city from the state. </li>
<li>The <code>Abilene-Sweetwater, TX</code> and <code>Seattle-Tacoma, WA</code> markets occur twice because of differing white spaces. </li>
<li>The misalignment in the market names indicates that there are leading and trailing white spaces which should be removed. </li>
<li>It would be useful to separate the Nielsen Market into <code>Nielsen City</code> and <code>Nielsen State</code> </li>
</ul>
<pre class="hljs-container hljs-container-solo"><code class="lang-gauss">/*
** Cleaning  data
*/
// Strip leading and trailing white spaces
territory_info[., "Nielsen Market"] = 
strtrim(territory_info[., "Nielsen Market"]);

// Update the Spokane listing
territory_info[., "Nielsen Market"] = 
strreplace(territory_info[., "Nielsen Market"], "spokane", "Spokane");

// Replace Spokane-WA with Spokane, WA
territory_info[., "Nielsen Market"] = 
strreplace(territory_info[., "Nielsen Market"], "Spokane- wa", "Spokane, WA");

// Split Nielsen Market into state and city
nielsen = asDF(strsplit(territory_info[., "Nielsen Market"], ","), 
          "Nielsen City", "Nielsen State");</code></pre>
<p>Notice that we've used three different GAUSS string procedures above. These three are all very useful for data cleaning and are worth noting:</p>
<table>
 <thead>
 <tr><th>Procedure</th><th>Purpose</th></tr>
</thead>
<tbody>
<tr><td><a href="https://docs.aptech.com/gauss/strtrim.html" target="_blank" rel="noopener">strtrim</a></td><td>Strips all white space characters from the left and right side of each element in a string array.</td></tr>
<tr><td><a href="https://docs.aptech.com/gauss/strreplace.html" target="_blank" rel="noopener">strreplace</a></td><td>Replaces all matches of a substring with a replacement string.</td></tr>
<tr><td><a href="https://docs.aptech.com/gauss/strsplit.html" target="_blank" rel="noopener">strsplit</a></td><td>Splits a string into individual tokens based on a specified separator.</td></tr>
</tbody>
</table>
<h3 id="generating-email-addresses">Generating Email Addresses</h3>
<p>Now that we've cleaned up the Nielsen Market data, we can generate the email addresses for our list. The email address for each store takes the general form <em>storenumber</em> + &quot;d&quot; + &quot;@petpeople.com&quot;. For example, the email address for store number 548 is <code>"548d@petpeople.com"</code>. </p>
<p>To generate our email addresses we need to:</p>
<ul>
<li>Convert the store numbers to strings.</li>
<li>Add the suffix of the email address to the new strings.</li>
</ul>
<p>To do this in GAUSS we will:</p>
<ul>
<li>Convert the store numbers to strings using the GAUSS function <a href="https://docs.aptech.com/gauss/itos.html" target="_blank" rel="noopener"><code>itos</code></a>.</li>
<li>Add the string prefix to form the email using <code>$+</code>.</li>
<li>Change the string array to a dataframe using <a href="https://docs.aptech.com/gauss/asdf.html" target="_blank" rel="noopener"><code>asDF</code></a>. </li>
</ul>
<pre class="hljs-container hljs-container-solo"><code class="lang-gauss">/*
** Create email addresses
*/
// Convert store number to string
str_store = itos(territory_info[., "Store"]);

// Add prefix
email_address = str_store $+ "d@petpeople.com";

// Convert to dataframe
// and name the variable "Email"
email_df = asDF(email_address, "Email");</code></pre>
<h3 id="build-email-database">Build Email Database</h3>
<p>We want the final email database to include <code>KAR</code>, <code>Store Name</code>, <code>Email</code>, <code>Nielsen City</code>, and <code>Nielsen State</code>.  </p>
<pre class="hljs-container hljs-container-solo"><code class="lang-gauss">// Form dataframe containing
// email list information
email_database = territory_info[., "KAR" "Store Name"] ~ email_df ~ nielsen;

// Preview database
head(email_database);</code></pre>
<p>The first five rows of our data look like:</p>
<pre>             KAR       Store Name              Email     Nielsen City    Nielsen State
   Larry McGuire     NY-MIDDLETWN 725d@petpeople.com         New York               NY
   Larry McGuire        STRATFORD 728d@petpeople.com         New York               NY
   Larry McGuire       NORWALK-CT 752d@petpeople.com         New York               NY
   Larry McGuire       SEEKONK-MA 758d@petpeople.com Providence et al            RI-MA
   Larry McGuire   SOUTHINGTON-CT 762d@petpeople.com Hartford and New               CT </pre>
<h3 id="filtering-the-data">Filtering the Data</h3>
<p>Now that our database is created, let's filter our data to focus on one representative, <code>Jeff Canary</code>, and save the email list under his name.</p>
<pre class="hljs-container hljs-container-solo"><code class="lang-gauss">/*
** Filtering and saving our email list
*/
// Specify KAR 
name = "Jeff Canary";

// Filter data for specified employee
email_list = selif(email_database, email_database[., "KAR"] .$== name);</code></pre>
<h3 id="export-to-csv-file">Export to CSV file</h3>
<p>As a final step, we will export the <code>email_list</code> dataframe to a CSV file using <a href="https://docs.aptech.com/gauss/saved.html" target="_blank" rel="noopener"><code>saved</code></a>.</p>
<pre class="hljs-container hljs-container-solo"><code class="lang-gauss">// Create file name
fsave_name = name $+ "_store_emails.csv";

// Save file
saved(email_list, fsave_name);</code></pre>
<h2 id="extra-credit-looping-through-all-representatives">Extra Credit: Looping Through All Representatives</h2>
<p>Suppose we need to export email lists for all representatives. We can do this using a fairly simple loop. </p>
<pre class="hljs-container hljs-container-solo"><code class="lang-gauss">// Get list of unique 
// representative names
kar_names  = unique(email_database[., "KAR"]);

// Loop over all names
for i(1, rows(kar_names), 1);
  /*
  ** Filtering and saving our email list
  */
  // Specify KAR to create email list for
  name = kar_names[i];

  // Filter data for specified employee
  email_list = selif(email_database, email_database[., "KAR"] .$== name);

  // Save email list
  fsave_name = name $+ "_store_emails.csv";

  // Save file
  saved(email_list, fsave_name);
endfor;</code></pre>
<h3 id="conclusion">Conclusion</h3>
<p>In today's blog we've demonstrated the improved string capabilities of GAUSS using a simple data cleaning task. Our project covered several useful tasks including:</p>
<ul>
<li>Loading raw data.</li>
<li>Cleaning common string data issues. </li>
<li>Generating new string variables by splitting and joining strings. </li>
<li>Exporting dataframes as CSV files.</li>
</ul>
<h3 id="further-reading">Further Reading</h3>
<ol>
<li><a href="https://www.aptech.com/blog/preparing-and-cleaning-data-fred-data-in-gauss/" target="_blank" rel="noopener">Preparing and Cleaning FRED data in GAUSS</a></li>
<li><a href="https://www.aptech.com/blog/what-is-a-gauss-dataframe-and-why-should-you-care/" target="_blank" rel="noopener">What is a GAUSS Dataframe and Why Should You Care?</a></li>
<li><a href="https://www.aptech.com/blog/getting-started-with-survey-data-in-gauss/" target="_blank" rel="noopener">Getting Started With Survey Data In GAUSS</a></li>
<li><a href="https://www.aptech.com/blog/getting-started-with-survey-data-in-gauss/" target="_blank" rel="noopener">Getting Started With Survey Data In GAUSS</a></li>
</ol>
<p></p>]]></content:encoded>
					
					<wfw:commentRss>https://www.aptech.com/blog/managing-string-data-with-gauss-dataframes/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
		<item>
		<title>Importing FRED Data to GAUSS</title>
		<link>https://www.aptech.com/blog/importing-fred-data-to-gauss/</link>
					<comments>https://www.aptech.com/blog/importing-fred-data-to-gauss/#comments</comments>
		
		<dc:creator><![CDATA[Eric]]></dc:creator>
		<pubDate>Fri, 16 Dec 2022 02:05:10 +0000</pubDate>
				<category><![CDATA[Best Practices]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[Time Series]]></category>
		<guid isPermaLink="false">https://www.aptech.com/?p=11583079</guid>

					<description><![CDATA[The GAUSS FRED database integration, introduced in GAUSS 23, is a time-saving feature that allows you to import FRED data directly into GAUSS. This means you have thousands of datasets at your fingertips without ever leaving GAUSS. These tools also ensure that FRED data is imported directly into a GAUSS dataframe format, which can eliminate hours of data cleaning and the headaches that come with it. 

In today's blog, we will learn how to use the FRED import tools to:
<ul>
<li> Search for a FRED data series. </li>
<li> Import FRED data to GAUSS, including merging multiple series. </li>
<li>Use advanced import tools to perform data transformations. </li>
</ul>]]></description>
										<content:encoded><![CDATA[<h3 id="introduction">Introduction</h3>
<p>The GAUSS <a href="https://fred.stlouisfed.org/" target="_blank" rel="noopener">FRED</a> database integration, introduced in <a href="https://www.aptech.com/blog/gauss23/" target="_blank" rel="noopener">GAUSS 23</a>, is a time-saving feature that allows you to import FRED data directly into GAUSS. This means you have thousands of datasets at your fingertips without ever leaving GAUSS. These tools also ensure that FRED data is imported directly into a GAUSS dataframe format, which can eliminate hours of data cleaning and the headaches that come with it. </p>
<p>In today's blog, we will learn how to use the FRED import tools to:</p>
<ul>
<li>Search for a FRED data series. </li>
<li>Import FRED data to GAUSS, including merging multiple series. </li>
<li>Use advanced import tools to perform data transformations. </li>
</ul>
<h2 id="getting-started">Getting Started</h2>
<h3 id="requesting-an-api-key">Requesting an API Key</h3>
<p>Prior to importing any data from FRED using GAUSS you will need to request an API key from FRED. This can be done on the <a href="https://fredaccount.stlouisfed.org/apikeys" target="_blank" rel="noopener">FRED API Request page</a>. To request an API key you will need:</p>
<ol>
<li>To create and/or login to a FRED account. </li>
<li>Provide a brief description of the program you intend to write. This can be simple such as, &quot;Using GAUSS to conduct economic research.&quot;</li>
</ol>
<h3 id="specifying-your-api-key-in-gauss">Specifying your API key in GAUSS</h3>
<p>You can set your API in GAUSS using any of the following methods:</p>
<ol>
<li>Set the API key directly at the top of your program:
<pre class="hljs-container hljs-container-solo"><code class="lang-gauss">FRED_API_KEY = "your_api_key"</code></pre></li>
<li>Set the environment variable <code>FRED_API_KEY</code> to your API key.</li>
<li>Edit your gauss.cfg and modify the <code>fred_api_key</code> value:
<pre class="hljs-container hljs-container-solo"><code class="lang-gauss">fred_api_key = your_api_key</code></pre></li>
</ol>
<h2 id="finding-your-fred-series">Finding Your FRED Series</h2>
<p>In order to download a series directly from FRED, we will need to know the series ID. However, this may not be something you know right offhand. Fortunately, we can use the <a href="https://docs.aptech.com/gauss/fred_search.html" target="_blank" rel="noopener"><code>fred_search</code></a> procedure to find the proper series ID.</p>
<p>The <code>fred_search</code> procedure requires one input, a string specifying the search text. As an example, let's search for all series related to <code>"producer price index"</code>:</p>
<pre class="hljs-container hljs-container-solo"><code class="lang-gauss">fred_search("producer price index");</code></pre>
<p>This prints a search report to the command window. The first five rows are:</p>
<pre>frequency  frequency_short group_popularity              id     last_updated  observation_end observation_star       popularity     realtime_end   realtime_start seasonal_adjustm seasonal_adjustm            title            units      units_short
Monthly                 M        80.000000           PPIACO 2022-11-15 07:52       2022-10-01       1913-01-01        80.000000       2022-11-23       2022-11-23 Not Seasonally A              NSA Producer Price I   Index 1982=100   Index 1982=100
Monthly                 M        79.000000          WPU0911 2022-11-15 07:52       2022-10-01       1926-01-01        79.000000       2022-11-23       2022-11-23 Not Seasonally A              NSA Producer Price I   Index 1982=100   Index 1982=100
Monthly                 M        79.000000            PCEPI 2022-10-28 08:40       2022-09-01       1959-01-01        78.000000       2022-11-23       2022-11-23 Seasonally Adjus               SA Personal Consump   Index 2012=100   Index 2012=100
Monthly                 M        78.000000  PCU325211325211 2022-11-15 07:55       2022-10-01       1976-06-01        78.000000       2022-11-23       2022-11-23 Not Seasonally A              NSA Producer Price I Index Dec 1980=1 Index Dec 1980=1 </pre>
<p>We can see that the FRED search report provides a thorough summary of related series. In addition to the <code>id</code>, which we will need to import the data from FRED, some other useful fields include:</p>
<ul>
<li>Frequency.</li>
<li>Popularity.</li>
<li>Last updated.</li>
<li>Observation end.</li>
<li>Observation start.</li>
<li>Seasonal adjustment status.</li>
<li>Units. </li>
</ul>
<p>For our next steps, let's use the <code>PPIACO</code> series, which is the highest popularity series related to the search term <code>Producer Price Index</code>.</p>
<div class="alert alert-info" role="alert">Note: A number of advanced search options are available and can be read about in the official documentation for the <code>fred_search</code></div>
<h2 id="importing-data-from-fred">Importing Data From FRED</h2>
<h3 id="loading-a-single-series-from-fred">Loading A Single Series From FRED</h3>
<p>Next, we will import the <code>PPIACO</code> series from the FRED database into GAUSS using the <a href="https://docs.aptech.com/gauss/fred_load.html" target="_blank" rel="noopener"><code>fred_load</code></a> procedure. </p>
<p>The <code>fred_load</code> procedure requires one string input specifying the series ID to be loaded. To load the producer price data that we found with our FRED search, we will use the series ID <code>PPIACO</code>:</p>
<pre class="hljs-container hljs-container-solo"><code class="lang-gauss">// Download all observations of 'PPIACO' into a GAUSS dataframe
PPI = fred_load("PPIACO");</code></pre>
<p>We can examine the first five rows of the <code>PPI</code> dataframe using the <code>head</code> procedure:</p>
<pre class="hljs-container hljs-container-solo"><code class="lang-gauss">// Print the first 5 rows of 'PPI'
head(PPI);</code></pre>
<p>which reports</p>
<pre>            date           PPIACO
      1913-01-01        12.100000
      1913-02-01        12.000000
      1913-03-01        12.000000
      1913-04-01        12.000000
      1913-05-01        11.900000 </pre>
<p>We can also use the <code>tail</code> procedure to examine the last 5 rows of the <code>PPI</code> dataframe:</p>
<pre class="hljs-container hljs-container-solo"><code class="lang-gauss">// Print the last 5 rows of 'PPI'
tail(PPI);</code></pre>
<pre>            date           PPIACO
      2022-06-01        280.25100
      2022-07-01        272.27800
      2022-08-01        269.46500
      2022-09-01        268.69300
      2022-10-01        265.19300
</pre>
<p>This shows us that the <code>PPIACO</code> data ranges from January, 1913 to October, 2022. Which is consistent with the observation start and end date reported in our FRED search. </p>
<h3 id="loading-multiple-series-from-fred">Loading Multiple Series From FRED</h3>
<p>The <code>fred_load</code> procedure can also be used to load multiple series from FRED simultaneously. To do this, we use a GAUSS <a href="https://www.aptech.com/resources/tutorials/loading-variables-from-a-file/" target="_blank" rel="noopener">formula string syntax</a>, using <code>+</code> to add additional series IDs to our formula string.</p>
<pre class="hljs-container hljs-container-solo"><code class="lang-gauss">// Load producer price
// and treasury bond data
macro_data = fred_load("PPIACO + T10Y2Y");

// Preview data
head(macro_data);</code></pre>
<p>The preview of our data shows that our two series have been imported together and automatically merged by date:</p>
<pre>            date           PPIACO           T10Y2Y
      1913-01-01        12.100000                .
      1913-02-01        12.000000                .
      1913-03-01        12.000000                .
      1913-04-01        12.000000                .
      1913-05-01        11.900000                . </pre>
<p>However, the preview doesn't necessarily give us reassurance that <code>T10Y2Y</code> was loaded properly because the values for the first five observations are all missing. Let's take a quick look at some summary statistics using <a href="https://docs.aptech.com/gauss/dstatmt.html" target="_blank" rel="noopener"><code>dstatmt</code></a>:</p>
<pre class="hljs-container hljs-container-solo"><code class="lang-gauss">// Compute and print descriptive statistics
// for all variables in 'macro_data'
dstatmt(macro_data);</code></pre>
<p>This prints a summary table to our <strong>Command Window</strong>:</p>
<pre>-----------------------------------------------------------------------------
Variable    Mean   Std Dev  Variance     Minimum     Maximum   Valid  Missing
-----------------------------------------------------------------------------

date       -----     -----     -----  1913-01-01  2022-11-25   13048        0
PPIACO     74.57      66.3      4396        10.3       280.3    1318    11730
T10Y2Y    0.9146     0.903    0.8155       -2.41        2.91   11619     1429 </pre>
<p>From this, we can tell that both series have been imported properly. However, they have different ranges, with both series having a number of missing values. </p>
<h3 id="plotting-a-fred-series">Plotting a FRED Series</h3>
<p>It could be useful to view our FRED data before importing it into the GAUSS workspace. This can be done using the <code>fred_load</code> procedure with the <a href="https://docs.aptech.com/gauss/plotxy.html" target="_blank" rel="noopener"><code>plotXY</code></a>.</p>
<p>To do this, we need to remember the dataframe returned from <code>fred_load</code> will always contain:</p>
<ul>
<li>A date variable named, <code>date</code></li>
<li>A variable for every series loaded named with the <code>seriesID</code> </li>
</ul>
<p>As an example, let's consider viewing the FRED S&amp;P 500 series with the series ID <code>sp500</code>:</p>
<pre class="hljs-container hljs-container-solo"><code class="lang-gauss">plotXY(fred_load("sp500"), "sp500 ~ date");</code></pre>
<p><a href="https://www.aptech.com/wp-content/uploads/2022/12/g23-fred-sp500.jpg"><img src="https://www.aptech.com/wp-content/uploads/2022/12/g23-fred-sp500.jpg" alt="" width="600" height="500" class="alignnone size-full wp-image-11583235" /></a></p>
<h2 id="advanced-import-tools">Advanced Import Tools</h2>
<p>One of most useful features of the GAUSS FRED import tools is that they can perform a number of data cleaning tasks at the time of import. In this section, we will look at how to use the FRED import tools to:</p>
<ul>
<li>Filter dates. </li>
<li>Aggregate data.</li>
<li>Perform data transformations. </li>
</ul>
<h3 id="the-fred-parameter-list">The FRED Parameter List</h3>
<p>GAUSS FRED functions use a parameter list for passing advanced settings. This list is constructed using the <a href="https://docs.aptech.com/gauss/fred_set.html" target="_blank" rel="noopener"><code>fred_set</code></a> function. </p>
<p>The <code>fred_set</code> function creates a running list of parameters you want to pass to the FRED functions. It is specified by first listing a parameter name, then the associated parameter value. </p>
<p>For example:</p>
<pre class="hljs-container hljs-container-solo"><code class="lang-gauss">// Create a FRED parameter list with
// 'frequency' set to 'q' (quarterly)
params_GDP = fred_set("frequency", "q");</code></pre>
<p>If we wish to add additional parameters values we can update an existing parameter list:</p>
<pre class="hljs-container hljs-container-solo"><code class="lang-gauss">// Set 'aggregation_method' to end-of-period
// in the previously created parameter list 'params_GDP'
params_GDP = fred_set("aggregation_method", "eop", params_GDP);</code></pre>
<p>Or we can specify all parameters at the same time:</p>
<pre class="hljs-container hljs-container-solo"><code class="lang-gauss">// Create a FRED parameter list with 2 settings at once.
params_GDP = fred_set("frequency", "q", "aggregation_method", "eop");</code></pre>
<p>There are a few things to note about the parameter list:</p>
<ol>
<li>The parameter specifications are case sensitive. </li>
<li>Order does not matter, with the exception that each parameter should be directly followed by its associated value. For example, we could have also specified </li>
</ol>
<pre class="hljs-container hljs-container-solo"><code class="lang-gauss">params_GDP = fred_set("aggregation_method", "eop", "frequency", "q");</code></pre>
<p>Next, we'll look at how to use the parameter list for advanced FRED data import. </p>
<h3 id="filtering-dates">Filtering Dates</h3>
<p>The <code>observation_start</code> and/or <code>observation_end</code> parameters can be used to filter the range of imported data. </p>
<p>For example, suppose we are interested in loading seasonally adjusted CPI data for all dates after 1971. Let's start by searching for the series ID we want to load:</p>
<pre class="hljs-container hljs-container-solo"><code class="lang-gauss">// Read series information from FRED and print first 5 rows
head(fred_search("consumer price index seasonally adjusted"));</code></pre>
<pre>       frequency  frequency_short group_popularity               id     last_updated            notes  observation_end observation_star       popularity     realtime_end   realtime_start seasonal_adjustm seasonal_adjustm            title            units      units_short
         Monthly                M        95.000000         CPIAUCSL 2022-11-10 07:38 The Consumer Pri       2022-10-01       1947-01-01        94.000000       2022-11-28       2022-11-28 Seasonally Adjus               SA Consumer Price I Index 1982-1984= Index 1982-1984=
         Monthly                M        95.000000         CPIAUCNS 2022-11-10 07:38 Handbook of Meth       2022-10-01       1913-01-01        71.000000       2022-11-28       2022-11-28 Not Seasonally A              NSA Consumer Price I Index 1982-1984= Index 1982-1984=
      Semiannual               SA        95.000000      CUUS0000SA0 2022-07-13 07:37                .       2021-01-01       1913-01-01        38.000000       2022-11-28       2022-11-28 Not Seasonally A Consumer Price I Inflation, consu          Percent Index 1982-1984=
          Annual                A        84.000000   FPCPITOTLZGUSA 2022-05-03 14:01 Inflation as mea       2021-01-01       1960-01-01        84.000000       2022-11-28       2022-11-28 Not Seasonally A              NSA Inflation, consu          Percent                %
         Monthly                M        83.000000  CPALTT01USM657N 2022-11-14 14:25 OECD descriptor        2022-09-01       1960-01-01        80.000000       2022-11-28       2022-11-28 Not Seasonally A              NSA Consumer Price I Growth rate prev Growth rate prev </pre>
<p>It looks like the best series for us to use is &quot;CPIAUCSL&quot;. However, this series starts in January 1947. </p>
<p>We can tell GAUSS to only import data starting from 1971 by setting the <code>observation_start</code> parameter to <code>"1971-01-01"</code> using the <code>fred_set</code> procedure:</p>
<pre class="hljs-container hljs-container-solo"><code class="lang-gauss">// Set observation_start parameter
// to use all data on or after 1971-01-01
params_cpi = fred_set("observation_start", "1971-01-01");</code></pre>
<p>Now we can load our CPI data using <code>fred_load</code> with two inputs:</p>
<ol>
<li>The series ID.</li>
<li>The parameter list, <code>params_cpi</code>.</li>
</ol>
<pre class="hljs-container hljs-container-solo"><code class="lang-gauss">// Load data using a parameter list
cpi_m = fred_load("CPIAUCSL", params_cpi);

// Preview first 5 rows of data
head(cpi_m);</code></pre>
<p>Our data preview shows that the imported data starts on January 1, 1971:</p>
<pre>            date         CPIAUCSL
      1971-01-01        39.900000
      1971-02-01        39.900000
      1971-03-01        40.000000
      1971-04-01        40.100000
      1971-05-01        40.300000 </pre>
<h3 id="aggregating-data">Aggregating Data</h3>
<p>Next, suppose we want to aggregate our data from monthly to quarterly data. The FRED import tools provide a convenient way to do this at the time of import using the <code>frequency</code> parameter. </p>
<p>The <code>frequency</code> parameter allows you to specify the frequency of data you would like. The specified frequency can only be the same or lower than the frequency of the original series. </p>
<p>Frequency options include:</p>
<table>
 <thead>
<tr><th>Specifier</th><th>Description</th></tr>
</thead>
<tbody>
<tr><td>"d"</td><td>Daily</td></tr>
<tr><td>"w"</td><td>Weekly</td></tr>
<tr><td>"bw"</td><td>Biweekly</td></tr>
<tr><td>"m"</td><td>Monthly</td></tr>
<tr><td>"q"</td><td>Quarterly</td></tr>
<tr><td>"sa"</td><td>Semiannual</td></tr>
<tr><td>"a"</td><td>Annual</td></tr>
</tbody>
</table>
<p>The default aggregation method is to use averaging. However, the <code>aggregation_method</code> parameter can be used to specify an aggregation method. Aggregation options include:</p>
<table>
 <thead>
<tr><th>Specifier</th><th>Description</th></tr>
</thead>
<tbody>
<tr><td>"avg"</td><td>Average</td></tr>
<tr><td>"sum"</td><td>Sum</td></tr>
<tr><td>"eop"</td><td>End of Period</td></tr>
</tbody>
</table>
<p>Let's use the <code>frequency</code> parameter to aggregate the monthly &quot;CPIAUCSL&quot; series to quarterly observations. We will also use the <code>aggregation_method</code> to specify that end-of-period aggregation is used:</p>
<pre class="hljs-container hljs-container-solo"><code class="lang-gauss">// Set parameter list
// Include previously specified
// parameter list to append new specifications
params_cpi = fred_set("frequency", "q", "aggregation_method", "eop", params_cpi);

// Load quarterly CPI
cpi_q_eop  = fred_load("CPIAUCSL", params_cpi);

head(cpi_q_eop);</code></pre>
<pre>            date         CPIAUCSL
      1971-01-01        40.000000
      1971-04-01        40.500000
      1971-07-01        40.800000
      1971-10-01        41.100000
      1972-01-01        41.400000</pre>
<p>The <code>cpi_q_eop</code> dataframe now contains quarterly data starting in January 1971. </p>
<h3 id="transformations">Transformations</h3>
<p>Finally, suppose we want to use our CPI data to study <a href="https://www.aptech.com/blog/understanding-state-space-models-an-inflation-example/" target="_blank" rel="noopener">inflation</a>. With the FRED import tools, we can do this using the <code>units</code> parameter with the <code>fred_load</code> procedure. </p>
<p>The <em>units</em> options include:</p>
<table>
 <thead>
<tr><th>Specifier</th><th>Description</th></tr>
</thead>
<tbody>
<tr><td>"lin"</td><td>Levels (no transformation).</td></tr>
<tr><td>"chg"</td><td>Change.</td></tr>
<tr><td>"ch1"</td><td>Change from one year ago.</td></tr>
<tr><td>"pch"</td><td>Percent change.</td></tr>
<tr><td>"pc1"</td><td>Percent change from one year ago.</td></tr>
<tr><td>"pca"</td><td>Compounded annual rate of change.</td></tr>
<tr><td>"cch"</td><td>Continuously compounded rate of change.</td></tr>
<tr><td>"cca"</td><td>Continuously compounded annual rate of change.</td></tr>
<tr><td>"log"</td><td>Natural log.</td></tr>
</tbody>
</table>
<p>Let's update our <code>params_cpi</code> parameter list and import the percent change of &quot;CPIAUCSL&quot; from a year ago. </p>
<pre class="hljs-container hljs-container-solo"><code class="lang-gauss">// Set params
params_cpi = fred_set("units", "pc1", params_cpi);

// Load quarterly CPI
infl_q  = fred_load("CPIAUCSL", params_cpi);
plotXY(infl_q,  "CPIAUCSL ~ date");</code></pre>
<p><a href="https://www.aptech.com/wp-content/uploads/2022/12/g23-fred-cpiaucsl.jpg"><img src="https://www.aptech.com/wp-content/uploads/2022/12/g23-fred-cpiaucsl.jpg" alt="Graph of CPI data." width="600" height="400" class="aligncenter size-full wp-image-11583255" /></a></p>
<h2 id="conclusion">Conclusion</h2>
<p>In today's blog, we saw how the GAUSS FRED integration introduced in GAUSS 23 can save you time and effort when it comes to working with FRED data. </p>
<p>We learned how to use the FRED import tools to:</p>
<ul>
<li>Search for a FRED data series. </li>
<li>Import FRED data to GAUSS, including merging multiple series. </li>
<li>Use advanced import tools to perform data transformations. </li>
</ul>
<p></p>
]]></content:encoded>
					
					<wfw:commentRss>https://www.aptech.com/blog/importing-fred-data-to-gauss/feed/</wfw:commentRss>
			<slash:comments>1</slash:comments>
		
		
			</item>
		<item>
		<title>Introduction to Efficient Creation of Detailed Plots</title>
		<link>https://www.aptech.com/blog/introduction-to-efficient-creation-of-detailed-plots/</link>
					<comments>https://www.aptech.com/blog/introduction-to-efficient-creation-of-detailed-plots/#respond</comments>
		
		<dc:creator><![CDATA[aptech]]></dc:creator>
		<pubDate>Tue, 27 Sep 2022 16:26:33 +0000</pubDate>
				<category><![CDATA[Best Practices]]></category>
		<category><![CDATA[Graphics]]></category>
		<category><![CDATA[Programming]]></category>
		<guid isPermaLink="false">https://www.aptech.com/?p=11582941</guid>

					<description><![CDATA[A few weeks ago, we showed you how to create a detailed plot from a recent article in the American Economic Review. That article contained several plots that contain quite a bit of similar and stylized formatting. Today we will show you how to <i>efficiently</i> create two of these graphs.

Our main goals are to get you thinking about code reuse and how it can help you:

<ul>
<li> Get more results from your limited research time.</li>
<li>Avoid the frustration that comes from growing mountains of spaghetti code.</li>
</ul>]]></description>
										<content:encoded><![CDATA[<h3 id="introduction">Introduction</h3>
<p>A few weeks ago, we showed you how to create a detailed plot from a recent article in the American Economic Review. That article contained several plots that contain quite a bit of similar and stylized formatting. Today we will show you how to <em>efficiently</em> create two of these graphs.</p>
<p>Our main goals are to get you thinking about code reuse and how it can help you:</p>
<ul>
<li>Get more results from your limited research time.</li>
<li>Avoid the frustration that comes from growing mountains of spaghetti code.</li>
</ul>
<div class="alert alert-info" role="alert">If you missed it, be sure to check out our original blog on this topic, <a href="https://www.aptech.com/blog/advanced-formatting-techniques-for-creating-aer-quality-plots/" target="_blank" rel="noopener">Advanced Formatting Techniques for Creating AER Quality Plots</a>. </div>
<h2 id="our-graphs">Our Graphs</h2>
<p>This is what we will create today. As you can see they share many style attributes. This gives us a great opportunity to reuse code.</p>
<p><a href="https://www.aptech.com/wp-content/uploads/2022/09/gblog-hetboombust1x2.jpg"><img src="https://www.aptech.com/wp-content/uploads/2022/09/gblog-hetboombust1x2.jpg" alt="Line plots from an American Economic Review article created by GAUSS." width="1000" height="400" class="aligncenter size-full wp-image-11582942" /></a></p>
<p>You can <a href="https://raw.githubusercontent.com/aptech/gauss_blog/master/programming/introduction-efficient-creation-plots-9.27.22/rationing.csv" target="_blank" rel="noopener">download the data here</a>. </p>
<h2 id="our-initial-code">Our Initial Code</h2>
<p>This is not a massive amount of code and many of you might be tempted to just copy and paste this code and make the minor modifications needed to get your desired result. I completely understand. Your biggest problem is probably a lack of time, so productivity is paramount.</p>
<p>While it might feel like this is a shortcut, it will saddle you with technical debt. Technical debt is just a fancy term that describes the stress, frustration, and time-wasting that inevitably occurs when you take shortcuts like this.</p>
<p>Not only will this save you pain, but it might save you some embarrassment as well. These sorts of mundane issues are real drivers of the replication crises in research today.</p>
<p>Your research is important and I know you want to do it right, so let's get started! </p>
<pre class="hljs-container hljs-container-solo"><code class="lang-gauss">new;
cls;

/*
** Load and preview data
*/
int_rate = loadd("int_rate.csv");

tail(int_rate);

ks = { 0.517, 0.653, 0.781  };

/*
** Graph data
*/

// Graph size
plotCanvasSize("px", 500 | 400);

// Default settings
struct plotControl plt;
plt = plotGetDefaults("xy");

// Font
plotSetFonts(&amp;plt, "all", "roboto", 14);

// Legend
plotSetLegend(&amp;plt, "", "vcenter left inside", 1);
plotSetLegendBkd(&amp;plt, 0);

// Main line settings
clrs = getColorPalette("set2");
plotSetLinePen(&amp;plt, 4, clrs[3 2], 1|3);

// Axes outline (spine)
plotSetOutlineEnabled(&amp;plt, 1);

// X-axis
plotSetTextInterpreter(&amp;plt, "latex", "xaxis");
plotSetXAxisLabel(&amp;plt, "\\text{country opacity }, \\omega");

// Y-axis
plotSetYLabel(&amp;plt, "interest rate");

// Draw main plot
plotXY(plt, int_rate, "high + low ~ x");

// Style and add vertical lines
plotSetLinePen(&amp;plt, 1, "#CCC", 2);
plotAddVLine(plt, ks);

// Style text boxes
struct plotAnnotation ant;
ant = annotationGetDefaults();
annotationSetTextInterpreter(&amp;ant, "latex");
annotationSetLinePen(&amp;ant, 0, "", -1);
annotationSetFont(&amp;ant, "", 14, "#3333");
annotationSetBkd(&amp;ant, "", 0);

// Add text boxes
plotAddTextbox(ant, "\\omega_1", ks[1], 0.15);
plotAddTextbox(ant, "\\omega_2", ks[2], 0.15);
plotAddTextbox(ant, "\\omega_3", ks[3], 0.15);</code></pre>
<h2 id="initial-code-simplification">Initial Code Simplification</h2>
<p>We will start by creating a <a href="https://www.aptech.com/blog/basics-of-gauss-procedures/" target="_blank" rel="noopener">procedure</a> to hold some of the plot styling functions that we want to repeat and apply them to the first plot only. Then we will add the data for the second plot.</p>
<p>It looks like all of the styling applied before the call to <a href="https://docs.aptech.com/gauss/plotxy.html" target="_blank" rel="noopener"><code>plotXY</code></a> will be the same in both plots, but the y-axis label text is different. So, let's create a procedure that will apply the main settings:</p>
<pre class="hljs-container hljs-container-solo"><code class="lang-gauss">new;
cls;

/*
** Load and preview data
*/
int_rate = loadd("int_rate.csv");

tail(int_rate);

ks = { 0.517, 0.653, 0.781  };

/*
** Graph data
*/

// Graph size
plotCanvasSize("px", 500 | 400);

// Declare plotControl structure
struct plotControl plt;

// Fill with defaults for this project
plt = pltDefaults();

// Set y-axis label for first plot
plotSetYLabel(&amp;plt, "interest rate");

// Draw first plot
plotXY(plt, int_rate, "high + low ~ x");

proc (1) = pltDefaults();
    local clrs;

    struct plotControl plt;
    plt = plotGetDefaults("xy");

    // Font
    plotSetFonts(&amp;plt, "all", "roboto", 14);

    // Legend
    plotSetLegend(&amp;plt, "", "vcenter left inside", 1);
    plotSetLegendBkd(&amp;plt, 0);

    // Main line settings
    clrs = getColorPalette("set2");
    plotSetLinePen(&amp;plt, 4, clrs[3 2], 1|3);

    // Axes outline (spine)
    plotSetOutlineEnabled(&amp;plt, 1);

    // X-axis
    plotSetTextInterpreter(&amp;plt, "latex", "xaxis");
    plotSetXAxisLabel(&amp;plt, "\\text{country opacity }, \\omega");

    retp(plt);
endp;
</code></pre>
<p>While this code is slightly longer when drawing just one plot, it will save us when we add the next plot. Before we do that, we need to address the <a href="https://www.aptech.com/blog/advanced-formatting-techniques-for-creating-aer-quality-plots/#add-vertical-lines" target="_blank" rel="noopener">vertical lines</a> and <a href="https://www.aptech.com/blog/advanced-formatting-techniques-for-creating-aer-quality-plots/#add-text-annotations" target="_blank" rel="noopener">annotations</a>.</p>
<h2 id="simplifying-the-annotations">Simplifying the Annotations</h2>
<p>Looking over the plots at the top of this article shows us that the vertical lines and the omega text boxes all depend on the <code>ks</code> vector. Since they seem to be intertwined, it is probably safe to put them in one procedure. </p>
<p>The simplest thing to do would be to add all the annotation code to a single procedure like this:</p>
<pre class="hljs-container hljs-container-solo"><code class="lang-gauss">proc (0) = pltAddOmegas(ks);

    struct plotControl plt;
    plt = plotGetDefaults("xy");

    // Style and add vertical lines
    plotSetLinePen(&amp;plt, 1, "#CCC", 2);
    plotAddVLine(plt, ks);

    // Style text boxes
    struct plotAnnotation ant;
    ant = annotationGetDefaults();
    annotationSetTextInterpreter(&amp;ant, "latex");
    annotationSetLinePen(&amp;ant, 0, "", -1);
    annotationSetFont(&amp;ant, "", 14, "#3333");
    annotationSetBkd(&amp;ant, "", 0);

    // Add text boxes
    plotAddTextbox(ant, "\\omega_1", ks[1], 0.15);
    plotAddTextbox(ant, "\\omega_2", ks[2], 0.15);
    plotAddTextbox(ant, "\\omega_3", ks[3], 0.15);
endp;</code></pre>
<p>and then call that procedure right after <code>plotXY</code>. In this case, it is not a bad place to start. However, since we are in learning mode, let's pretend that we were going to create more graphs in this file that would add text boxes with the same styling, but would use different greek letters and would be located in a different place in the graph.</p>
<p>In that case, we would probably want to separate the text box styling from the text box drawing, like this:</p>
<pre class="hljs-container hljs-container-solo"><code class="lang-gauss">proc (1) = textBoxDefaults();
    struct plotAnnotation ant;
    ant = annotationGetDefaults();

    annotationSetTextInterpreter(&amp;ant, "latex");
    annotationSetLinePen(&amp;ant, 0, "", -1);
    annotationSetFont(&amp;ant, "", 14, "#3333");
    annotationSetBkd(&amp;ant, "", 0);

    retp(ant);
endp;

proc (0) = pltAddOmegas(ks);

    struct plotControl plt;
    plt = plotGetDefaults("xy");

    // Style and add vertical lines
    plotSetLinePen(&amp;plt, 1, "#CCC", 2);
    plotAddVLine(plt, ks);

    struct plotAnnotation ant;
    ant = textBoxDefaults();

    // Add text boxes
    plotAddTextbox(ant, "\\omega_1", ks[1], 0.15);
    plotAddTextbox(ant, "\\omega_2", ks[2], 0.15);
    plotAddTextbox(ant, "\\omega_3", ks[3], 0.15);
endp;</code></pre>
<h3 id="conclusion-and-final-code">Conclusion and final code</h3>
<p>Below is the final code to create the graphs from the top of this blog. This isn't designed to show you the best way to write this code, but rather to get you started with the idea of code reuse.</p>
<p>Software engineers sometimes use the acronym DRY — Don't Repeat Yourself. While that is a great practice, even just repeating yourself less often will bring you great rewards.</p>
<pre class="hljs-container hljs-container-solo"><code class="lang-gauss">new;
cls;

/*
** Load and preview data
*/
int_rate = loadd("int_rate.csv");
tail(int_rate);

rationing = loadd("rationing.csv");
tail(rationing);

ks = { 0.517, 0.653, 0.781  };

/*
** Graph data
*/

// Graph size
plotCanvasSize("px", 1000 | 400);

// Declare plotControl structure and
// fill with defaults for this project
struct plotControl plt;
plt = pltDefaults();

/*
** Interest rate plot
*/

// Create grid for multiple plots
plotLayout(1,2,1);

// Set y-axis label for first plot
plotSetYLabel(&amp;plt, "interest rate");

// Draw first plot
plotXY(plt, int_rate, "high + low ~ x");
pltAddOmegas(ks);

/*
** Rationing plot
*/

// Create grid for multiple plots
plotLayout(1,2,2);

// Set y-axis label for first plot
plotSetYLabel(&amp;plt, "rationing");

// Draw first plot
plotXY(plt, rationing, "high + low ~ x");
pltAddOmegas(ks);

proc (1) = pltDefaults();
    local clrs;

    struct plotControl plt;
    plt = plotGetDefaults("xy");

    // Font
    plotSetFonts(&amp;plt, "all", "roboto", 14);

    // Legend
    plotSetLegend(&amp;plt, "", "vcenter left inside", 1);
    plotSetLegendBkd(&amp;plt, 0);

    // Main line settings
    clrs = getColorPalette("set2");
    plotSetLinePen(&amp;plt, 4, clrs[3 2], 1|3);

    // Axes outline (spine)
    plotSetOutlineEnabled(&amp;plt, 1);

    // X-axis
    plotSetTextInterpreter(&amp;plt, "latex", "xaxis");
    plotSetXAxisLabel(&amp;plt, "\\text{country opacity }, \\omega");

    retp(plt);
endp;

proc (1) = textBoxDefaults();
    struct plotAnnotation ant;
    ant = annotationGetDefaults();

    annotationSetTextInterpreter(&amp;ant, "latex");
    annotationSetLinePen(&amp;ant, 0, "", -1);
    annotationSetFont(&amp;ant, "", 14, "#3333");
    annotationSetBkd(&amp;ant, "", 0);

    retp(ant);
endp;

proc (0) = pltAddOmegas(ks);

    struct plotControl plt;
    plt = plotGetDefaults("xy");

    // Style and add vertical lines
    plotSetLinePen(&amp;plt, 1, "#CCC", 2);
    plotAddVLine(plt, ks);

    struct plotAnnotation ant;
    ant = textBoxDefaults();

    // Add text boxes
    plotAddTextbox(ant, "\\omega_1", ks[1], 0.15);
    plotAddTextbox(ant, "\\omega_2", ks[2], 0.15);
    plotAddTextbox(ant, "\\omega_3", ks[3], 0.15);
endp;</code></pre>
<h3 id="further-reading">Further Reading</h3>
<ol>
<li><a href="https://www.aptech.com/blog/advanced-formatting-techniques-for-creating-aer-quality-plots/" target="_blank" rel="noopener">Advanced Formatting Techniques for Creating AER Quality Plots.</a></li>
<li><a href="https://www.aptech.com/blog/visualizing-covid-19-panel-data-with-gauss-22/" target="_blank" rel="noopener">Visualizing COVID-19 Panel Data With GAUSS 22.</a> </li>
<li><a href="https://www.aptech.com/blog/how-to-mix-match-and-style-different-graph-types/" target="_blank" rel="noopener">How to Mix, Match and Style Different Graph Types.</a></li>
</ol>]]></content:encoded>
					
					<wfw:commentRss>https://www.aptech.com/blog/introduction-to-efficient-creation-of-detailed-plots/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
		<item>
		<title>Advanced Formatting Techniques for Creating AER Quality Plots</title>
		<link>https://www.aptech.com/blog/advanced-formatting-techniques-for-creating-aer-quality-plots/</link>
					<comments>https://www.aptech.com/blog/advanced-formatting-techniques-for-creating-aer-quality-plots/#comments</comments>
		
		<dc:creator><![CDATA[aptech]]></dc:creator>
		<pubDate>Wed, 27 Jul 2022 21:18:24 +0000</pubDate>
				<category><![CDATA[Graphics]]></category>
		<category><![CDATA[Programming]]></category>
		<guid isPermaLink="false">https://www.aptech.com/?p=11582603</guid>

					<description><![CDATA[This blog will show you how to reproduce one of the graphs from a paper in the June 2022 issue of the American Economic Review. You will learn how to:
<ol>
<li>Add and style text boxes with LaTeX.</li>
<li>Set the anchor point of text boxes.</li>
<li>Add and style vertical lines.</li>
<li>Automatically set legend text to use your dataframe's variable names.</li>
<li> Set the font for all or a subset of the graph text elements.</li>
<li> Set the size of your graph.</li>
</ol>]]></description>
										<content:encoded><![CDATA[<h3 id="introduction">Introduction</h3>
<p>Today's blog will show you how to reproduce one of the graphs from a paper in the June 2022 issue of the journal, <a href="https://www.aeaweb.org/journals/aer" target="_blank" rel="noopener">American Economic Review</a>. You will learn how to:</p>
<ol>
<li>Add and style text boxes with LaTeX.</li>
<li>Set the anchor point of text boxes.</li>
<li>Add and style vertical lines.</li>
<li>Automatically set legend text to use your dataframe's variable names.</li>
<li>Set the font for all or a subset of the graph text elements.</li>
<li>Set the size of your graph.</li>
</ol>
<h2 id="the-graph-and-data">The Graph and Data</h2>
<p>Below is the graph that we are going to create. It is adapted from a recent paper in the American Economic Review. You can <a href="https://raw.githubusercontent.com/aptech/gauss_blog/master/graphics/advanced-formatting-aer-quality/int_rate.csv" target="_blank" rel="noopener">download the data here</a>.</p>
<p><a href="https://www.aptech.com/wp-content/uploads/2022/07/gblog-hetboombust-1-final.jpg"><img src="https://www.aptech.com/wp-content/uploads/2022/07/gblog-hetboombust-1-final.jpg" alt="" width="500" height="400" class="aligncenter size-full wp-image-11582641" /></a></p>
<h2 id="load-and-preview-data">Load and Preview Data</h2>
<p>Our first step will be to <a href="https://www.aptech.com/resources/tutorials/loading-variables-from-a-file/" target="_blank" rel="noopener">load the data</a> and take a quick look at it.</p>
<pre class="hljs-container hljs-container-solo"><code class="lang-gauss">// Load all variables from 'int_rate.csv'
int_rate = loadd("int_rate.csv");

// Print the first 5 observations of 'int_rate'
print "First 5 observations:";
head(int_rate);

// Print the last 5 observations of 'int_rate'
print "Last 5 observations:";
tail(int_rate);

// Print descriptive statistics of our variables
call dstatmt(int_rate);</code></pre>
<p>This will give us the following results:</p>
<pre>First 5 observations:

               x             high              low
   0.00010000000       0.17140598        0.0000000
   0.00020000000       0.17140598        0.0000000
   0.00030000000       0.17140598        0.0000000
   0.00040000000       0.17140598        0.0000000
   0.00050000000       0.17140598        0.0000000

Last 5 observations:

               x             high              low
      0.99950000       0.17140598       0.23012203
      0.99960000       0.17140598       0.23012203
      0.99970000       0.17140598       0.23012203
      0.99980000       0.17140598       0.23012203
      0.99990000       0.17140598       0.23012203

-------------------------------------------------------------------------------
Variable     Mean    Std Dev     Variance    Minimum    Maximum   Valid Missing
-------------------------------------------------------------------------------

x             0.5      0.289       0.0833      1e-4      0.999    9999    0
high       0.1714   1.04e-07     1.08e-14     0.171      0.171    9999    0
low       0.09374      0.108       0.0116         0      0.230    9999    0</pre>
<h2 id="initial-graphs">Initial Graphs</h2>
<p>Our first graphs will use the default GAUSS styling. We will create one graph indexing our <code>x</code> and <code>y</code> variables and another using a <a href="https://www.aptech.com/resources/tutorials/formula-string-syntax/" target="_blank" rel="noopener">formula string</a>.</p>
<h3 id="indexing">Indexing</h3>
<p><a href="https://www.aptech.com/wp-content/uploads/2022/07/gblog-hetboombust-1-unstyled-index.jpg"><img src="https://www.aptech.com/wp-content/uploads/2022/07/gblog-hetboombust-1-unstyled-index.jpg" alt="" width="500" height="400" class="aligncenter size-full wp-image-11582627" /></a></p>
<pre class="hljs-container hljs-container-solo"><code class="lang-gauss">// Set our graph size to 500x400 pixels
plotCanvasSize("px", 500 | 400);

// Use indexing to select the 'x' and 'y' variables
plotXY(int_rate[.,"x"], int_rate[.,"high" "low"]);</code></pre>
<h3 id="formula-string">Formula string</h3>
<p><a href="https://www.aptech.com/wp-content/uploads/2022/07/gblog-hetboombust-1-unstyled-formula.png"><img src="https://www.aptech.com/wp-content/uploads/2022/07/gblog-hetboombust-1-unstyled-formula.png" alt="" width="500" height="400" class="aligncenter size-full wp-image-11582634" /></a></p>
<pre class="hljs-container hljs-container-solo"><code class="lang-gauss">// Set our graph size to 500x400 pixels
plotCanvasSize("px", 500 | 400);

// Specify the 'x' and 'y' variables using a formula string
plotXY(int_rate, "high + low ~ x");</code></pre>
<p>When we use a formula string with our plot functions, it tells GAUSS that we want to use the information from the <a href="https://www.aptech.com/blog/what-is-a-gauss-dataframe-and-why-should-you-care/" target="_blank" rel="noopener">dataframe</a>.</p>
<p>When using a formula string in plots:</p>
<ul>
<li>The tilde symbol, <code>~</code>, separates the <code>y</code> variables on the left from the <code>x</code> variable(s) on the right. </li>
<li>If there is a single <code>y</code> variable, GAUSS will use that variable name to label the <code>y</code> axis. If there is more than one <code>y</code> variable, then the variable names will be added to the legend.</li>
<li>The name of the <code>x</code> variable will be used to label the x-axis.</li>
</ul>
<p>While this may not always be the information we want to be displayed in our final plot, it makes it convenient to quickly create graphs that are easier to interpret.</p>
<h2 id="plot-styling">Plot Styling</h2>
<p>Fp
Next, we will adjust the styling to match our intended final plot.</p>
<p>To programmatically style our graph, the first thing we need to do is to create a <code>plotControl</code> structure and fill it with default values.</p>
<pre class="hljs-container hljs-container-solo"><code class="lang-gauss">struct plotControl plt;
plt = plotGetDefault("xy");</code></pre>
<h3 id="legend-styling">Legend styling</h3>
<p>After the pointer to the <code>plotControl</code> structure we want to modify, <code>&amp;plt</code>, the function <a href="https://docs.aptech.com/gauss/plotsetlegend.html" target="_blank" rel="noopener"><code>plotSetLegend</code></a> takes one required input and two optional ones.</p>
<ol>
<li>
<p><strong>Legend text</strong>: This controls the text that will be displayed in the legend. We want GAUSS to use the variable names from our input. Therefore, below, we set this to an empty string, <code>""</code>. This tells GAUSS that we do not want to modify the default behavior of the legend text.</p>
<p>As we mentioned earlier, the default behavior for a graph created with a formula string with more than one <code>y</code> variable is to use the <code>y</code> variable names as the legend text elements.</p>
</li>
<li>
<p><strong>Legend location</strong>: This input can be a string with text location specifications, or a 2x1 vector with the <code>x</code> and <code>y</code> coordinates for the location of the top-left corner of the legend.</p>
</li>
<li><strong>Legend orientation</strong>: This input specifies whether the legend position should be stacked vertically or horizontally. We set it to <code>1</code> to indicate a vertical arrangement. It may help to remember that a <code>1</code> is a vertical essentially a vertical mark.</li>
</ol>
<p>The first input to <a href="https://docs.aptech.com/gauss/plotsetlegendbkd.html" target="_blank" rel="noopener"><code>plotSetLegendBkd</code></a>, after the <code>plotControl</code> structure pointer, controls the legend opacity. We set it to be 0% opaque, or 100% transparent.</p>
<pre class="hljs-container hljs-container-solo"><code class="lang-gauss">plotSetLegend(&amp;plt, "", "vcenter left inside", 1);
plotSetLegendBkd(&amp;plt, 0);</code></pre>
<h3 id="font-styling">Font styling</h3>
<p><a href="https://docs.aptech.com/gauss/plotsetfonts.html" target="_blank" rel="noopener"><code>plotSetFonts</code></a> provides a convenient way to set the font family, size, and color for any subset of the text in your graph. Below we set the font for 'all' of the text in the plot. However, there are many other options, including: <code>"axes"</code>, <code>"legend"</code>, <code>"legend_title"</code>, <code>"title"</code>, <code>"ticks"</code> and many more.</p>
<pre class="hljs-container hljs-container-solo"><code class="lang-gauss">plotSetFonts(&amp;plt, "all", "roboto", 14);</code></pre>
<h3 id="x-axis-label">X-axis label</h3>
<p><a href="https://docs.aptech.com/gauss/plotsettextinterpreter.html" target="_blank" rel="noopener"><code>plotSetTextInterpreter</code></a> tells GAUSS whether you would like text labels to be interpreted as:</p>
<ul>
<li>HTML</li>
<li>LaTeX</li>
<li>Plain text</li>
</ul>
<p>Like <code>plotSetFonts</code>, it allows you to specify many different locations, or even, <code>"all"</code>. Below, we set the x-axis to be interpreted as LaTeX and then use LaTeX in our x-axis label.</p>
<pre class="hljs-container hljs-container-solo"><code class="lang-gauss">plotSetTextInterpreter(&amp;plt, "latex", "xaxis");
plotSetXAxisLabel(&amp;plt, "\\text{country opacity }, \\omega");</code></pre>
<h3 id="main-line-styling">Main line styling</h3>
<p>The &quot;set2&quot; <a href="https://docs.aptech.com/gauss/getcolorpalette.html" target="_blank" rel="noopener">color palette</a> contains eight colors:<br/>
<span><svg height="24" width="192">    <rect fill="rgb(102,194,165)" width="24" height="24" x="0"></rect>    <rect fill="rgb(252,141,98)" width="24" height="24" x="24"></rect>    <rect fill="rgb(141,160,203)" width="24" height="24" x="48"></rect>    <rect fill="rgb(231,138,195)" width="24" height="24" x="72"></rect>    <rect fill="rgb(166,216,84)" width="24" height="24" x="96"></rect>    <rect fill="rgb(255,217,47)" width="24" height="24" x="120"></rect>    <rect fill="rgb(229,196,148)" width="24" height="24" x="144"></rect>    <rect fill="rgb(179,179,179)" width="24" height="24" x="168"></span></p>
<p>We want to use the third color for our first series, &quot;high&quot;, and the second color for our second series, &quot;low&quot;.</p>
<p>Additionally, we set the line width to <code>4</code> pixels and set the line style to <code>1</code> and <code>3</code> respectively. One indicates a solid line and three is for a dotted line.</p>
<pre class="hljs-container hljs-container-solo"><code class="lang-gauss">clrs = getColorPalette("set2");

// Set the line width, line colors, and line style
plotSetLinePen(&amp;plt, 4, clrs[3 2], 1|3);</code></pre>
<h3 id="axes-outline">Axes outline</h3>
<p>The axes outline, or spine as they are called by other libraries, controls the lines around the edges of the data area. By default, the bottom x-axis and left y-axis are enabled. The code below will also turn on the lines on the top x-axis and right y-axis.</p>
<pre class="hljs-container hljs-container-solo"><code class="lang-gauss">plotSetOutlineEnabled(&amp;plt, 1);</code></pre>
<h3 id="graph-before-annotations">Graph before annotations</h3>
<p>If we draw the graph, using our previous styling and a formula string as shown below:</p>
<pre class="hljs-container hljs-container-solo"><code class="lang-gauss">plotXY(plt, int_rate, "high + low ~ x");</code></pre>
<p>we get the following plot:</p>
<p><a href="https://www.aptech.com/wp-content/uploads/2022/07/gblog-hetboombust-1-styled-wo-annotations.jpg"><img src="https://www.aptech.com/wp-content/uploads/2022/07/gblog-hetboombust-1-styled-wo-annotations.jpg" alt="" width="500" height="400" class="aligncenter size-full wp-image-11582659" /></a></p>
<h2 id="add-vertical-lines">Add Vertical Lines</h2>
<h3 id="line-styling">Line styling</h3>
<p>We will continue with the <code>plotControl</code> structure we created earlier and modify the line settings to match the vertical lines from the graph we are trying to reproduce.</p>
<pre class="hljs-container hljs-container-solo"><code class="lang-gauss">// Set lines to be: 1 pixel wide, light gray (#CCC) and dashed style (2)
plotSetLinePen(&amp;plt, 1, "#CCC", 2);</code></pre>
<h3 id="draw-the-lines">Draw the lines</h3>
<p>We add the vertical lines using <a href="https://docs.aptech.com/gauss/plotaddvline.html" target="_blank" rel="noopener"><code>plotAddVLine</code></a>. <code>plotAddVLine</code> takes an optional <code>plotControl</code> structure as the first input and then a vector of one or more x-axis locations at which to draw the vertical spanning lines.</p>
<pre class="hljs-container hljs-container-solo"><code class="lang-gauss">// The x-axis locations for the vertical lines
ks = { 0.517, 0.653, 0.781 };

plotAddVLine(plt, ks);</code></pre>
<h2 id="add-text-annotations">Add Text Annotations</h2>
<h3 id="style-text-boxes">Style text boxes</h3>
<p>The annotation styling functions use a <code>plotAnnotation</code> structure but work very similarly to the main plot styling (<code>plotSet</code>) functions. Therefore, we will just use comments to describe their actions.</p>
<pre class="hljs-container hljs-container-solo"><code class="lang-gauss">struct plotAnnotation ant;
ant = annotationGetDefaults();

// Set text to be interpreted as LaTeX
annotationSetTextInterpreter(&amp;ant, "latex");

// Turn off the text box bounding line, by setting:
//     line-width=0, line-color="" (ignore), line-style=-1 (no line)
annotationSetLinePen(&amp;ant, 0, "", -1);

// Leave the font-family as default, "",
// Set the font-size to 14 points and the color to a
// dark gray, #333
annotationSetFont(&amp;ant, "", 14, "#3333");

// Leave the annotation background color, "".
// Set the opacity to 0% (100% transparent)
annotationSetBkd(&amp;ant, "", 0);</code></pre>
<h3 id="draw-the-text-boxes">Draw the text boxes</h3>
<p>After the optional <code>plotAnnotation</code> structure, <a href="https://docs.aptech.com/gauss/plotaddtextbox.html" target="_blank" rel="noopener"><code>plotAddTextbox</code></a> takes 3 input arguments:</p>
<ol>
<li><strong>The text to display.</strong><br/></li>
<li><strong>The x-axis coordinate.</strong><br/></li>
<li><strong>The y-axis coordinate.</strong><br/></li>
</ol>
<p>By default, the x and y-axis coordinates specify the location of the top-left of the bounding box that contains the text.</p>
<pre class="hljs-container hljs-container-solo"><code class="lang-gauss">plotAddTextbox(ant, "\\omega_1", ks[1], 0.15);
plotAddTextbox(ant, "\\omega_2", ks[2], 0.15);
plotAddTextbox(ant, "\\omega_3", ks[3], 0.15);</code></pre>
<h2 id="full-code">Full code</h2>
<p>Below is the full code to create our graph.</p>
<pre class="hljs-container hljs-container-solo"><code class="lang-gauss">new;
cls;

/*
** Load and preview data
*/
int_rate = loadd("int_rate.csv");

tail(int_rate);

ks = { 0.517, 0.653, 0.781  };

/*
** Graph data
*/

// Graph size
plotCanvasSize("px", 500 | 400);

// Default settings
struct plotControl plt;
plt = plotGetDefaults("xy");

// Font
plotSetFonts(&amp;plt, "all", "roboto", 14);

// Legend
plotSetLegend(&amp;plt, "", "vcenter left inside", 1);
plotSetLegendBkd(&amp;plt, 0);

// Main line settings
clrs = getColorPalette("set2");
plotSetLinePen(&amp;plt, 4, clrs[3 2], 1|3);

// Axes outline (spine)
plotSetOutlineEnabled(&amp;plt, 1);

// X-axis
plotSetTextInterpreter(&amp;plt, "latex", "xaxis");
plotSetXAxisLabel(&amp;plt, "\\text{country opacity }, \\omega");

// Draw main plot
plotXY(plt, int_rate, "high + low ~ x");

// Style and add vertical lines
plotSetLinePen(&amp;plt, 1, "#CCC", 2);
plotAddVLine(plt, ks);

// Style text boxes
struct plotAnnotation ant;
ant = annotationGetDefaults();
annotationSetTextInterpreter(&amp;ant, "latex");
annotationSetLinePen(&amp;ant, 0, "", -1);
annotationSetFont(&amp;ant, "", 14, "#3333");
annotationSetBkd(&amp;ant, "", 0);

// Add text boxes
plotAddTextbox(ant, "\\omega_1", ks[1], 0.15);
plotAddTextbox(ant, "\\omega_2", ks[2], 0.15);
plotAddTextbox(ant, "\\omega_3", ks[3], 0.15);</code></pre>
<h2 id="bonus-content-text-box-anchor-position">Bonus Content: Text Box Anchor Position</h2>
<p>For this case, we wanted the text boxes to appear to just the right of the vertical lines and the vertical position of the text boxes was not critical. Therefore, the default anchor position worked well.</p>
<p>However, if we had needed the text boxes to be towards the bottom of the graph, the first of them would have overlapped with one of our lines. We can see this by changing the <code>plotAddTextbox</code> lines to the following:</p>
<pre class="hljs-container hljs-container-solo"><code class="lang-gauss">// Draw the text boxes at a lower position, y=0.04
plotAddTextbox(ant, "\\omega_1", ks[1], 0.04);
plotAddTextbox(ant, "\\omega_2", ks[2], 0.04);
plotAddTextbox(ant, "\\omega_3", ks[3], 0.04);</code></pre>
<p>This makes the bottom of our graph look like this:</p>
<p><a href="https://www.aptech.com/wp-content/uploads/2022/07/gblog-overlapping-omega-1.jpg"><img src="https://www.aptech.com/wp-content/uploads/2022/07/gblog-overlapping-omega-1.jpg" alt="" width="499" height="145" class="aligncenter size-full wp-image-11582674" /></a></p>
<p>In this case, it would be nice to move the text boxes to the left of the vertical line. We can do this by using the final optional input of <code>plotAddTextbox</code>. It is a string that allows you to specify the position of the text box with respect to its anchor position.</p>
<p>The string options include:</p>
<ul>
<li><strong>Vertical position</strong>: <code>"top"</code>, <code>"vcenter"</code>, <code>"bottom"</code>.<br/></li>
<li><strong>Horizontal position</strong>: <code>"left"</code>, <code>"hcenter"</code>, <code>"right"</code>.<br/></li>
</ul>
<p>or <code>"center"</code> which is equivalent to <code>"vcenter hcenter"</code>.</p>
<p>For this example, we will just move the text boxes to the left of the vertical lines which are at the same position as the text box's anchor locations:</p>
<pre class="hljs-container hljs-container-solo"><code class="lang-gauss">// Draw the text boxes at a lower position, y=0.04
plotAddTextbox(ant, "\\omega_1", ks[1], 0.04, "left");
plotAddTextbox(ant, "\\omega_2", ks[2], 0.04, "left");
plotAddTextbox(ant, "\\omega_3", ks[3], 0.04, "left");</code></pre>
<p>This gives us the following image:</p>
<p><a href="https://www.aptech.com/wp-content/uploads/2022/07/gblog-left-omega-1.jpg"><img src="https://www.aptech.com/wp-content/uploads/2022/07/gblog-left-omega-1.jpg" alt="" width="496" height="142" class="aligncenter size-full wp-image-11582676" /></a></p>
<h3 id="conclusion">Conclusion</h3>
<p>Great job! You have learned how to:</p>
<ol>
<li>Add and style text boxes with LaTeX.</li>
<li>Set the anchor point of text boxes.</li>
<li>Add and style vertical lines.</li>
<li>Automatically set legend text to use your dataframe's variable names.</li>
<li>Set the font for all or a subset of the graph text elements.</li>
<li>Set the size of your graph.</li>
<li>Control the position of text boxes with respect to their attachment point.</li>
</ol>
<h3 id="further-reading">Further Reading</h3>
<ol>
<li><a href="https://www.aptech.com/blog/visualizing-covid-19-panel-data-with-gauss-22/" target="_blank" rel="noopener">Visualizing COVID-19 Data with GAUSS 22</a></li>
<li><a href="https://www.aptech.com/blog/how-to-mix-match-and-style-different-graph-types/" target="_blank" rel="noopener">How to Mix, Match, and Style Different Graph Types</a></li>
<li><a href="https://www.aptech.com/blog/how-to-create-tiled-graphs-in-gauss/" target="_blank" rel="noopener">How to Create Tiled Graphs in GAUSS</a></li>
<li><a href="https://www.aptech.com/blog/five-hacks-for-creating-custom-gauss-graphics/" target="_blank" rel="noopener">Five Hacks for Creating Custom GAUSS Graphics</a></li>
</ol>
<h3 id="references">References</h3>
<p>Farboodi, Maryam, and Péter Kondor. 2022. &quot;Heterogeneous Global Booms and Busts.&quot; <i>American Economic Review</i>, 112 (7): 2178-2212.
DOI: 10.1257/aer.20181830</p>]]></content:encoded>
					
					<wfw:commentRss>https://www.aptech.com/blog/advanced-formatting-techniques-for-creating-aer-quality-plots/feed/</wfw:commentRss>
			<slash:comments>2</slash:comments>
		
		
			</item>
		<item>
		<title>Installing the GAUSS Package Manager [Video]</title>
		<link>https://www.aptech.com/blog/installing-gauss-package-manager/</link>
					<comments>https://www.aptech.com/blog/installing-gauss-package-manager/#respond</comments>
		
		<dc:creator><![CDATA[Eric]]></dc:creator>
		<pubDate>Tue, 31 May 2022 14:54:43 +0000</pubDate>
				<category><![CDATA[Programming]]></category>
		<category><![CDATA[User Interface]]></category>
		<guid isPermaLink="false">https://www.aptech.com/?p=11582519</guid>

					<description><![CDATA[]]></description>
										<content:encoded><![CDATA[<iframe width="560" height="315" src="https://www.youtube.com/embed/sXrFAVCtFyc" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
<p>GAUSS packages provide access to powerful tools for performing data analysis.  Learn how to install the <a href="https://www.aptech.com/blog/gauss-package-manager-basics/">GAUSS Package Manager</a>, and get the quickest access to the full suite of GAUSS packages, in this short video. </p>
<h3 id="additional-resources">Additional Resources</h3>
<ul>
<li><a href="https://www.aptech.com/blog/gauss-package-manager-basics/">GAUSS Package Manager</a></li>
<li><a href="https://www.aptech.com/blog/using-gauss-packages-complete-guide/">Using GAUSS Packages a Complete Guide</a></li>
</ul>]]></content:encoded>
					
					<wfw:commentRss>https://www.aptech.com/blog/installing-gauss-package-manager/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
		<item>
		<title>How to Load Excel Data into GAUSS</title>
		<link>https://www.aptech.com/blog/how-to-load-excel-data-into-gauss/</link>
					<comments>https://www.aptech.com/blog/how-to-load-excel-data-into-gauss/#respond</comments>
		
		<dc:creator><![CDATA[Eric]]></dc:creator>
		<pubDate>Mon, 18 Apr 2022 19:36:19 +0000</pubDate>
				<category><![CDATA[Econometrics]]></category>
		<category><![CDATA[Programming]]></category>
		<guid isPermaLink="false">https://www.aptech.com/?p=11582484</guid>

					<description><![CDATA[Loading data is often the first step to your data analysis in GAUSS. In this video, you'll learn how to save time and avoid data loading errors when working with Excel files. 

Our video demonstration shows just how quick and easy it can be to load time series, categorical and numeric variables from Excel files into GAUSS.  You'll learn how to:

<ul>
<li>Interactively load Excel data files.</li>
<li>Perform advanced loading steps, Such as loading specific sheets, or specifying values as missing values.</li>
<li>Use autogenerated code in a program file.</li>
<li>Change variable names</li>
<li>Set up categoical labels and and base cases.</li>
</ul>
]]></description>
										<content:encoded><![CDATA[<iframe width="560" height="315" src="https://www.youtube.com/embed/VWUpPPHlKzY" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
<h3 id="introduction">Introduction</h3>
<p>Data loading is often the first step in your data analysis. In this video, you'll learn how to save time and avoid data loading errors when working with Excel files. </p>
<p>Our video demonstration shows just how quick and easy it can be to load time series, categorical and numeric variables from Excel files into GAUSS. </p>
<h2 id="interactively-preview-and-load-variables">Interactively preview and load variables</h2>
<p>See how to use the GAUSS <strong>Data Import</strong> window to interactively:</p>
<ul>
<li>Load basic Excel data.</li>
<li>Load data from different Excel sheets.</li>
<li>Specify variables to load. </li>
<li>Specify <a href="https://www.aptech.com/blog/what-is-a-gauss-dataframe-and-why-should-you-care/">dataframe</a> names. </li>
</ul>
<h2 id="use-autogenerated-code-to-reproduce-your-steps">Use autogenerated code to reproduce your steps.</h2>
<p>The <strong>Data Import</strong> window auto-generates code to perform all the import and filter steps. We show you how to put this code into a program and run the file to repeat your data loading steps.</p>
<h2 id="data-exploration-and-cleaning">Data Exploration and Cleaning</h2>
<p>GAUSS provides an easy-to-use environment for data exploration and cleaning. In this video, we'll demonstrate how to:</p>
<ul>
<li>Perform descriptive statistics. </li>
<li>Change variable names. </li>
<li>Specify values, such as-999, as missing values </li>
<li>Change categorical labels. </li>
<li>Set the category base case. </li>
</ul>
<h3 id="additional-resources">Additional Resources</h3>
<ul>
<li><a href="https://docs.aptech.com/gauss/data-management.html">GAUSS Data Management Guide</a></li>
<li><a href="https://www.aptech.com/blog/what-is-a-gauss-dataframe-and-why-should-you-care/">What is a GAUSS Dataframe and Why Should You Care</a> </li>
<li><a href="https://www.aptech.com/blog/getting-to-know-your-data-with-gauss-22/">Getting to Know Your Data WIth GAUSS 22</a></li>
<li><a href="https://www.aptech.com/blog/dates-and-times-made-easy/">Dates and Time Made Easy</a></li>
<li><a href="https://www.aptech.com/blog/easy-management-of-categorical-variables/">Easy Management of Categorical Variables</a></li>
</ul>]]></content:encoded>
					
					<wfw:commentRss>https://www.aptech.com/blog/how-to-load-excel-data-into-gauss/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
	</channel>
</rss>
