Subsetting a Dataset

I loaded a CSV datafile with many variables using the GAUSS function loadd(). The first row of the CSV file contains the variable names, including date variable. I loaded first the dependent variable as follows:

y = loadd(fname, "wage");

and then I know I could load all the variables except wage using:

x = loadd(fname, ". -wage").

However, I do not want to create x, but selecting some variables from my CSV file. For example, some of the variables contained in my CSV file are x1, x2, x3, x4, x5,..., x80. How can select x1, x2, x10, x15, x79 and x80 from my CSV file? Can GAUSS do it?

3 Answers



0



Yes, you can do it in GAUSS and I will post a code snippet with a GAUSS proc which will work. However, in most cases, you are actually better off to load all the variables in a single matrix.

If you would like to explore that option (which may well serve you much better), please post a new question asking something like "Is there a way to avoid using column vectors" or something like that and we will gladly help you to find a better solution.

Here is the code snippet to create the local variables, though I don't recommend it. The code snippet assumes that you have a file named mydata.csv in your current working directory which contains at least six variables: x1, x2, x10, x15, x79 and x80.

new;
cls;
fname = "mydata.csv";

// Create new global variables in your GAUSS workspace,
// named 'x1', 'x2', 'x10', 'x15', 'x79' and 'x80'
// from 'mydata.csv'.
loadvars(fname, "x1 + x2 + x10 + x15 + x79 + x80");

/*
** Creates global column vectors.
**
** fname    String, name of a data file (CSV,DAT,XLSX,SAS,STAT)
**          which can be properly loaded by 'loadd'.
**
** vars     Simple formula string which can only contain
**          variable names and the '+' operator. i.e.
**          "GDP + Inflation + Exports".
**
**          No transformations are allowed.
**
*/
proc (0) = loadvars(fname, vars);

    local x;

    // Load specified variables as colmns of 'x'
    x = loadd(fname, vars);

    // Split the forumla string into
    // a column vector, making splits at each '+'
    // sign.
    vars = strsplit(vars, "+");

    // Trim any empty space from either side of
    // the variable name
    vars = strtrim(vars);

    for i(1, cols(x), 1);
        // Create each GLOBAL column vector
        call varput(x[.,i], vars[i]);
    endfor;
endp;

aptech

1,773


0



OK and thank you for your help. Could you please explain how to use those global variables? I thought that after creating these column vector variables, I could concatenate them using ~ to form a matrix x, where x = x1~x2~x10~x15~x79~x80, but it did not work. I am trying form x, since I want to compute LS estimates using these specific independent variables. So, I need to form xtx = x'x.



0



If you want to create a matrix x, where x = x1 ~ x2 ~ x10 ~ x15 ~ x79 ~ x80. Then all you need is loadd.

new;
cls;
fname = "mydata.csv";

// Create new global variable named 'x' in your GAUSS workspace,
// with 6 columns which correspond to the variables: 'x1', 'x2', 
// 'x10', 'x15', 'x79' and 'x80' in the file 'mydata.csv'
x = loadd(fname, "x1 + x2 + x10 + x15 + x79 + x80");

// Add a column of ones so you can estimate the intercept
x = ones(rows(x), 1) ~ x;

// Create moment matrix
xtx = x'x;

Based on your last message, I am pretty sure that this is exactly what you are looking for.

Possible problems with global column vectors

I think you will be fine just using the solution above and can probably ignore this. However, for completeness, here is an answer to the global variable problem. My best guess is that you did something like this:

new;
cls;
fname = "mydata.csv";

// Create new global variables in your GAUSS workspace,
// named 'x1', 'x2', 'x10', 'x15', 'x79' and 'x80'
// from 'mydata.csv'.
loadvars(fname, "x1 + x2 + x10 + x15 + x79 + x80");

// Concatenate all column vectors into one matrix
x = x1 ~ x2 ~ x10 ~ x15 ~ x79 ~ x80;

/*
** Creates global column vectors.
**
** fname    String, name of a data file (CSV,DAT,XLSX,SAS,STAT)
**          which can be properly loaded by 'loadd'.
**
** vars     Simple formula string which can only contain
**          variable names and the '+' operator. i.e.
**          "GDP + Inflation + Exports".
**
**          No transformations are allowed.
**
*/
proc (0) = loadvars(fname, vars);

    local x;

    // Load specified variables as colmns of 'x'
    x = loadd(fname, vars);

    // Split the forumla string into
    // a column vector, making splits at each '+'
    // sign.
    vars = strsplit(vars, "+");

    // Trim any empty space from either side of
    // the variable name
    vars = strtrim(vars);

    for i(1, cols(x), 1);
        // Create each GLOBAL column vector
        call varput(x[.,i], vars[i]);
    endfor;
endp;

And you got an error message like this Undefined symbol: 'x1'. If that is what happened, the reason is that GAUSS turns the compiles and checks the code for errors before it runs the code.

One of the errors it checks for is undefined symbols. This is a very good thing for you because otherwise, you could have some code running for hours or days and then have it fail because you misspelled a variable name at the end of the code.

This can be resolved by either using varindxi with the open command when opening a GAUSS dataset or by using the declare command.

The declare command allows you to tell GAUSS that it may not see a particular variable, but you will create it during the program so GAUSS does not have to return an error. declare can also set a default value, or assign a particular value if the variable has not otherwise been set.

varindxi is a flag passed to the open command when you open a GAUSS dataset. It tells GAUSS to create variables to index the columns of the dataset. These index variables will be i and then your variable name. So if you loaded a matrix from a GAUSS dataset with three variables GDP, Inflation and Exports, then after the load you would have three new scalar values as well iGDP = 1, iInflation = 2 and iExports = 3, because GDP would be the first column of the loaded matrix.

aptech

1,773

Your Answer

3 Answers

0

Yes, you can do it in GAUSS and I will post a code snippet with a GAUSS proc which will work. However, in most cases, you are actually better off to load all the variables in a single matrix.

If you would like to explore that option (which may well serve you much better), please post a new question asking something like "Is there a way to avoid using column vectors" or something like that and we will gladly help you to find a better solution.

Here is the code snippet to create the local variables, though I don't recommend it. The code snippet assumes that you have a file named mydata.csv in your current working directory which contains at least six variables: x1, x2, x10, x15, x79 and x80.

new;
cls;
fname = "mydata.csv";

// Create new global variables in your GAUSS workspace,
// named 'x1', 'x2', 'x10', 'x15', 'x79' and 'x80'
// from 'mydata.csv'.
loadvars(fname, "x1 + x2 + x10 + x15 + x79 + x80");

/*
** Creates global column vectors.
**
** fname    String, name of a data file (CSV,DAT,XLSX,SAS,STAT)
**          which can be properly loaded by 'loadd'.
**
** vars     Simple formula string which can only contain
**          variable names and the '+' operator. i.e.
**          "GDP + Inflation + Exports".
**
**          No transformations are allowed.
**
*/
proc (0) = loadvars(fname, vars);

    local x;

    // Load specified variables as colmns of 'x'
    x = loadd(fname, vars);

    // Split the forumla string into
    // a column vector, making splits at each '+'
    // sign.
    vars = strsplit(vars, "+");

    // Trim any empty space from either side of
    // the variable name
    vars = strtrim(vars);

    for i(1, cols(x), 1);
        // Create each GLOBAL column vector
        call varput(x[.,i], vars[i]);
    endfor;
endp;

0

OK and thank you for your help. Could you please explain how to use those global variables? I thought that after creating these column vector variables, I could concatenate them using ~ to form a matrix x, where x = x1~x2~x10~x15~x79~x80, but it did not work. I am trying form x, since I want to compute LS estimates using these specific independent variables. So, I need to form xtx = x'x.

0

If you want to create a matrix x, where x = x1 ~ x2 ~ x10 ~ x15 ~ x79 ~ x80. Then all you need is loadd.

new;
cls;
fname = "mydata.csv";

// Create new global variable named 'x' in your GAUSS workspace,
// with 6 columns which correspond to the variables: 'x1', 'x2', 
// 'x10', 'x15', 'x79' and 'x80' in the file 'mydata.csv'
x = loadd(fname, "x1 + x2 + x10 + x15 + x79 + x80");

// Add a column of ones so you can estimate the intercept
x = ones(rows(x), 1) ~ x;

// Create moment matrix
xtx = x'x;

Based on your last message, I am pretty sure that this is exactly what you are looking for.

Possible problems with global column vectors

I think you will be fine just using the solution above and can probably ignore this. However, for completeness, here is an answer to the global variable problem. My best guess is that you did something like this:

new;
cls;
fname = "mydata.csv";

// Create new global variables in your GAUSS workspace,
// named 'x1', 'x2', 'x10', 'x15', 'x79' and 'x80'
// from 'mydata.csv'.
loadvars(fname, "x1 + x2 + x10 + x15 + x79 + x80");

// Concatenate all column vectors into one matrix
x = x1 ~ x2 ~ x10 ~ x15 ~ x79 ~ x80;

/*
** Creates global column vectors.
**
** fname    String, name of a data file (CSV,DAT,XLSX,SAS,STAT)
**          which can be properly loaded by 'loadd'.
**
** vars     Simple formula string which can only contain
**          variable names and the '+' operator. i.e.
**          "GDP + Inflation + Exports".
**
**          No transformations are allowed.
**
*/
proc (0) = loadvars(fname, vars);

    local x;

    // Load specified variables as colmns of 'x'
    x = loadd(fname, vars);

    // Split the forumla string into
    // a column vector, making splits at each '+'
    // sign.
    vars = strsplit(vars, "+");

    // Trim any empty space from either side of
    // the variable name
    vars = strtrim(vars);

    for i(1, cols(x), 1);
        // Create each GLOBAL column vector
        call varput(x[.,i], vars[i]);
    endfor;
endp;

And you got an error message like this Undefined symbol: 'x1'. If that is what happened, the reason is that GAUSS turns the compiles and checks the code for errors before it runs the code.

One of the errors it checks for is undefined symbols. This is a very good thing for you because otherwise, you could have some code running for hours or days and then have it fail because you misspelled a variable name at the end of the code.

This can be resolved by either using varindxi with the open command when opening a GAUSS dataset or by using the declare command.

The declare command allows you to tell GAUSS that it may not see a particular variable, but you will create it during the program so GAUSS does not have to return an error. declare can also set a default value, or assign a particular value if the variable has not otherwise been set.

varindxi is a flag passed to the open command when you open a GAUSS dataset. It tells GAUSS to create variables to index the columns of the dataset. These index variables will be i and then your variable name. So if you loaded a matrix from a GAUSS dataset with three variables GDP, Inflation and Exports, then after the load you would have three new scalar values as well iGDP = 1, iInflation = 2 and iExports = 3, because GDP would be the first column of the loaded matrix.


You must login to post answers.

Have a Specific Question?

Get a real answer from a real person

Need Support?

Get help from our friendly experts.

Try GAUSS for 14 days for FREE

See what GAUSS can do for your data

© Aptech Systems, Inc. All rights reserved.

Privacy Policy