Introduction
The new GAUSS 23 is the most practical GAUSS yet! It's built with the intention to save you time on everyday research tasks like finding, importing, and modeling data.
Data at Your Fingertips
- Access millions of global economic and financial data series with FRED and DBnomics integration.
- Aggregate, filter, sort, and transform FRED data series during import.
- Search FRED series from GAUSS.
Load Data from Anywhere on the Internet
// Load an Excel file from the aptech website
file_url = "https://www.aptech.com/wp-content/uploads/2019/03/skincancer2.xlsx";
skin_cancer = loadd(file_url);
// Print the first 5 rows of the dataframe
head(skin_cancer);
State Lat Mort Ocean Long
Alabama 33 219 1 87
Arizona 34.5 160 0 112
Arkansas 35 170 0 92.5
California 37.5 182 1 119.5
Colorado 39 149 0 105.5
Simplified Data Loading with...
Automatic Type Detection
Previous versions required formula strings with keywords to specify date, string, and categorical variables from some file types.
Smart data type detection in GAUSS 23 figures out the variable type so you do not have to specify it manually. Automatically detects nearly 40 popular date formats.
Automatic Header and Delimiter Detection
Replace old code like this:
load X[127,4] = mydata.txt;
with
X = loadd("mydata.txt");
Automatically handles
- Present or absent header row.
- Delimiter (tab, comma, semi-colon or space).
- Number of rows and columns.
- Variable types.
First-Class Dataframe Storage
No new code to learn, just use the .gdat file extension with loadd and saved to load and store your dataframes.
Expanded Quantile Regressions
hitters = loadd("islr_hitters.xlsx");
tau = 0.90;
call quantileFit(hitters, "ln(salary) ~ AtBat + Hits + HmRun", tau);
Linear quantile regression =============================================================================== Valid cases: 263 Dependent variable: ln_salary_ Missing cases: 0 Deletion method: None Number variables: 3 DF model 3 DF residuals 259
=============================================================================== Name Coeff. Standard t-value P >|t| lb ub Error
------------------------------------------------------------------------------- Tau = 0.90
CONSTANT 6.285 0.194 32.433 0.0000 5.905 6.664 AtBat -0.001 0.002 -0.737 0.4621 -0.004 0.002 Hits 0.008 0.005 1.526 0.1281 -0.002 0.018 HmRun 0.017 0.009 1.951 0.0521 -0.000 0.034
- New kernel estimated variance-covariance matrix.
- Up to 4x speed improvement.
- Expanded model diagnostics including pseudo R-squared, coefficient t-statistics and p-values, and degrees of freedom.
Kernel Density Estimations
- Estimate unknown probability functions with 13 available kernels.
- Automatic or user-specified bandwidth.
- Kernel density plots with easy-to-use options for customization.
Improved Covariance Computations
// Load data
fname = getGAUSShome("examples/auto2.dta");
auto = loadd(fname);
// Declare control structure
struct olsmtControl ctl;
ctl = olsmtControlCreate();
// Turn on residuals
ctl.res = 1;
// Turn on HAC errors
ctl.cov = "hac";
call olsmt(auto, "mpg ~ weight + foreign", ctl);
Valid cases: 74 Dependent variable: mpg
Missing cases: 0 Deletion method: None
Total SS: 2443.459 Degrees of freedom: 71
R-squared: 0.663 Rbar-squared: 0.653
Residual SS: 824.172 Std error of est: 3.407
F(2,71): 69.748 Probability of F: 0.000
Durbin-Watson: 2.421
Std Prob Std Cor with
Variable Estimate Error t-value >|t| Est Dep Var
-------------------------------------------------------------------------------
CONSTANT 41.6797 1.8989 21.95 0.000 --- ---
weight -0.00659 0.0006 -11.99 0.000 -0.885 -0.807175
foreign: Foreign -1.65003 0.9071 -1.819 0.073 -0.131 0.393397
Note: HAC robust standard errors reported
- New procedure for computing Newey-West HAC robust standard errors.
- All robust covariance procedures now include the option to turn off small sample corrections.
- Expanded dataframe and formula string compatibility.
New Functions for Data Cleaning and Exploration
between
Returns a binary vector indicating which observations fall in a specified range. It can be used with selif to select rows. Dates and ordinal categorical columns are supported.
// Return a 1 if the observation is between the listed dates
match = between(unemp[.,"DATE"], "2020-03", "2020-08");
// Select the matching observations
unemp = selif(unemp, match);
DATE UNRATE
2020-03-01 4.4000
2020-04-01 14.700
2020-05-01 13.200
2020-06-01 11.000
2020-07-01 10.200
2020-08-01 8.4000
where
Provides a convenient and intuitive way to combine or modify data. It returns elements from either a or b depending upon condition.
// Daily hotel room price
hotel_price = { 238, 405, 405, 329, 238 };
// Daily temperature forecast
temperature = { 89, 94, 110, 103, 97 };
// Decrease the price by 10% if the
// temperature will be more than 100 degrees
new_price = where(temperature .> 100,
hotel_price .* 0.9,
hotel_price);
new_price = 238 405 364.50 296.10 238
- Explore sample symmetry and tails with
skewnessandkurtosisfunctions. - Test for normality using the new
JarqueBerafunction.
Speed-ups and Efficiency Improvements
- Up to 10x speed-up and 50% decrease in memory usage for lag creation with
shiftcandlagn. - Up to 2x speed-up (or more for large data) and 50% decrease in memory usage for
miss,missrv. - Up to 2x speed-up (or more for large data) and 50% decrease in memory usage for element-by-element mathematical (
+,-,.*,./), relational (.>,.<,.>=,.<=,.==,.!=) and logical (.and,.not,.or,.xor) operators. - Up to 100x speed-up for some cases with
indsav. - Up to 40% speed-up for
reclassify. - Up to 3x speed-up for loading Excel® files with
loaddand the Data Import Window.
Conclusion
For a complete list of all GAUSS 23 offers please see the complete changelog.







