Group Column Sum

I have an unbalanced panel setup. I have a variable of interest that I would like to sum up across groups for all individuals. Basically, something similar to Matlab's grpstats command. I would rather not create a huge list of dummy variables given the size of my panel.

 

data example:

person, group, x

1,1,1

1,2,3

2,1,1

 

Desired output:

person, group, x, group sum

1,1,1, 4

1,2,3,4

2,1,1,1

1 Answer



0



accepted

Below is a procedure that, I think, will do what you want. It would be simple to extend it to also allow computation of statistics other than just the sum. Also, with GAUSS 17's new ability to load data from CSV and Excel files by variable name, you could also make it operate on a file and specify the 'id' and 'sum' columns by name.

//Starting panel
data = { 1 1 1,
         1 2 3,
         2 1 1 };

//Which column should be used to define the selected group
id_col = 1;

//Which column should be summed?
col_sum = 3;

//Create new panel matrix with additional column
//containing, sum of column 3 by group
data_new = grpSum(data, id_col, col_sum);

proc (1) = grpSum(data, id_col, col_sum);
    local out, groups, group_idx, group_var, group_sum;
    
    //Add an extra column to hold the sums
    out = data ~ zeros(rows(data), 1);
    
    //List of all groups
    groups = unique(data[.,id_col]);
    
    //Loop over each group
    for i(1, rows(groups), 1);
        //Find indices of the i'th group
        group_idx = indexcat(data[.,id_col] .== groups[i], 1);
        
        //Extract data to sum for i'th group
        group_var = data[group_idx,col_sum];
        
        //Reshape to correct size for write
        group_sum = reshape(sumc(group_var), rows(group_idx), 1);
        
        //Write sums in appropriate location
        out[group_idx,cols(out)] = group_sum;
    endfor;
    retp(out);
endp;

aptech

1,773

Your Answer

1 Answer

0
accepted

Below is a procedure that, I think, will do what you want. It would be simple to extend it to also allow computation of statistics other than just the sum. Also, with GAUSS 17's new ability to load data from CSV and Excel files by variable name, you could also make it operate on a file and specify the 'id' and 'sum' columns by name.

//Starting panel
data = { 1 1 1,
         1 2 3,
         2 1 1 };

//Which column should be used to define the selected group
id_col = 1;

//Which column should be summed?
col_sum = 3;

//Create new panel matrix with additional column
//containing, sum of column 3 by group
data_new = grpSum(data, id_col, col_sum);

proc (1) = grpSum(data, id_col, col_sum);
    local out, groups, group_idx, group_var, group_sum;
    
    //Add an extra column to hold the sums
    out = data ~ zeros(rows(data), 1);
    
    //List of all groups
    groups = unique(data[.,id_col]);
    
    //Loop over each group
    for i(1, rows(groups), 1);
        //Find indices of the i'th group
        group_idx = indexcat(data[.,id_col] .== groups[i], 1);
        
        //Extract data to sum for i'th group
        group_var = data[group_idx,col_sum];
        
        //Reshape to correct size for write
        group_sum = reshape(sumc(group_var), rows(group_idx), 1);
        
        //Write sums in appropriate location
        out[group_idx,cols(out)] = group_sum;
    endfor;
    retp(out);
endp;

You must login to post answers.

Have a Specific Question?

Get a real answer from a real person

Need Support?

Get help from our friendly experts.

Try GAUSS for 14 days for FREE

See what GAUSS can do for your data

© Aptech Systems, Inc. All rights reserved.

Privacy Policy