Tabulating a variable in Gauss with % (shares)

Hi,

I have a matrix X in Gauss that is 10000 x 3 with numbers, one digit (number) per column but there is no variable name in each column. I want to tabulate them and get shares per category. More precise, I first need to merge them to then tabulate them. For example,

WANT

1   2   3     123

2   1   3     213

1   2   3     123

3   1   2     312

Then, I need to tabulate them,

NUMBER       %

123       2                     50%

213       1                     25%

312       1                     25%

 

How can I do that in Gauss?

3 Answers



1



I think the dstat add-on module (not the built-in dstat function) might have this capability. However, if I am understanding what you are trying to do. You can make some code to do it yourself in GAUSS.

I am assuming that you are starting with a matrix like this:

1   2   3
2   1   3
1   2   3
3   1   2

and you want to end up with a little print-out like this:

NUMBER       %

123     2     50%
213     1     25%
312     1     25%

The first thing we will need to do is to convert each row of integers into a 3 digit integer:

// Starting data
X = { 1   2   3,
      2   1   3,
      1   2   3,   
      3   1   2 };

// Multiplier for each column
mlt = { 100 10 1 };

// Multiply each column
// so we can add them and get
// a 3 digit number
X_mlt = X .* mlt;

// Sum the rows
X_id = sumr(X_mlt);

print X_id;

The above code will print the following results:

 123
 213
 123
 312

Now that we have the integer id's for the rows, we need to:

  1. Figure out what our unique id's are.
  2. Compute what percentage corresponds to each unique id.
// Find the unique id's
unique_ids = unique(X_id);

// Create a matrix with the same
// number of rows as X_id and the
// same number of columns as unique_ids
// This will have a 1 for a match or a 0
// for no match
matches = X_id .== unique_ids';

// Sum down the columns to count the
// total found for each id
id_tot = sumc(matches);

// Percentage for each id
id_pct = id_tot ./ rows(X_id);

/*
** Make an output table
*/

// $~ does horizontal string concatenation
head = "ID" $~ "Count" $~ "%";

// the ~ operator performs horizontal concatenation
// ntos turns numbers into strings
body = ntos(unique_ids ~ id_tot ~ id_pct);

// $| performs vertical string concatenation
tbl = head $| body;

print tbl;

This code will return

 ID            Count                %
123                2              0.5
213                1             0.25
312                1             0.25

We could make this code much more compact, but it is probably useful for you to see each step individually. I would encourage you to run the code and look at each intermediate variable to make sure you understand what is happening.

aptech

1,773


0



Thank you so much for your prompt response. I think the proposed solutions works great, but I wondered if you could please add to that the case of having zeros in an X matrix:

0 1 2

2 0 1

2 0 1

1 2 0

Your solution above without zeros in X does not work (and it is my fault) since the resulting X_id will not contain the zero in the beginning of each row, but a 2-digit number.

 

Thanks again!



0



One way you could do it is to add 1000 to the front of each id and then after it is converted to a string array, remove the leading 1.

// Add 1e(cols(X)) to each id before converting to a string
body = ntos((unique_ids + 10^cols(X))~ id_tot ~ id_pct);

// Remove the first character of the 'id'
body[.,1] = strtruncl(body[.,1], 1);

aptech

1,773

Your Answer

3 Answers

1

I think the dstat add-on module (not the built-in dstat function) might have this capability. However, if I am understanding what you are trying to do. You can make some code to do it yourself in GAUSS.

I am assuming that you are starting with a matrix like this:

1   2   3
2   1   3
1   2   3
3   1   2

and you want to end up with a little print-out like this:

NUMBER       %

123     2     50%
213     1     25%
312     1     25%

The first thing we will need to do is to convert each row of integers into a 3 digit integer:

// Starting data
X = { 1   2   3,
      2   1   3,
      1   2   3,   
      3   1   2 };

// Multiplier for each column
mlt = { 100 10 1 };

// Multiply each column
// so we can add them and get
// a 3 digit number
X_mlt = X .* mlt;

// Sum the rows
X_id = sumr(X_mlt);

print X_id;

The above code will print the following results:

 123
 213
 123
 312

Now that we have the integer id's for the rows, we need to:

  1. Figure out what our unique id's are.
  2. Compute what percentage corresponds to each unique id.
// Find the unique id's
unique_ids = unique(X_id);

// Create a matrix with the same
// number of rows as X_id and the
// same number of columns as unique_ids
// This will have a 1 for a match or a 0
// for no match
matches = X_id .== unique_ids';

// Sum down the columns to count the
// total found for each id
id_tot = sumc(matches);

// Percentage for each id
id_pct = id_tot ./ rows(X_id);

/*
** Make an output table
*/

// $~ does horizontal string concatenation
head = "ID" $~ "Count" $~ "%";

// the ~ operator performs horizontal concatenation
// ntos turns numbers into strings
body = ntos(unique_ids ~ id_tot ~ id_pct);

// $| performs vertical string concatenation
tbl = head $| body;

print tbl;

This code will return

 ID            Count                %
123                2              0.5
213                1             0.25
312                1             0.25

We could make this code much more compact, but it is probably useful for you to see each step individually. I would encourage you to run the code and look at each intermediate variable to make sure you understand what is happening.

0

Thank you so much for your prompt response. I think the proposed solutions works great, but I wondered if you could please add to that the case of having zeros in an X matrix:

0 1 2

2 0 1

2 0 1

1 2 0

Your solution above without zeros in X does not work (and it is my fault) since the resulting X_id will not contain the zero in the beginning of each row, but a 2-digit number.

 

Thanks again!

0

One way you could do it is to add 1000 to the front of each id and then after it is converted to a string array, remove the leading 1.

// Add 1e(cols(X)) to each id before converting to a string
body = ntos((unique_ids + 10^cols(X))~ id_tot ~ id_pct);

// Remove the first character of the 'id'
body[.,1] = strtruncl(body[.,1], 1);


You must login to post answers.

Have a Specific Question?

Get a real answer from a real person

Need Support?

Get help from our friendly experts.

Try GAUSS for 14 days for FREE

See what GAUSS can do for your data

© Aptech Systems, Inc. All rights reserved.

Privacy Policy