CODES FOR GROUPING AND SUMMING UP?

Hi Aptech,

I'm new to Gauss and am trying to figure out the codes to do grouping and summing up in Gauss. However, just cannot get my codes work. Can you please give any suggestions on this please?

Here is my problem:

Given a matrix A. Say, the 1st column is year and the second column is id number.

       1.0000000        2.0000000        7.0000000        50.000000 
       3.0000000        2.0000000        2.0000000        80.000000 
       1.0000000        4.0000000        5.0000000        60.000000 
       3.0000000        2.0000000        9.0000000        100.00000 
       4.0000000        8.0000000        2.0000000        90.000000 

To do: group and sum up based on the first and the second column.That is,
for the rows that have the same values in the first and second columns, sum up the values in their third and the fourth column.

The desired outcome is:

       1.0000000        2.0000000        7.0000000        50.000000 
       1.0000000        4.0000000        5.0000000        60.000000 
       3.0000000        2.0000000        11.000000        180.00000 
       4.0000000        8.0000000        2.0000000        90.000000 

Here is the for loop I've tried:

new;cls;
a = {1 2 7 50,3 2 2 80,1 4 5 60,3 2 9 100,4 8 2 90};
print a;
yrmat = unique(a[.,1],1);
idmat = unique(a[.,2],1);

sum = {};
year = {};
for i(1,rows(yrmat),1);
    tem1 = selif(a[.,2:cols(a)], a[.,1] .eq yrmat[i]);
    id = {};
    for j(1,rows(idmat),1);
        tem2 = selif(tem1[.,2:cols(tem1)], tem1[.,] .eq idmat[j]);
        id = id | tem2;
    endfor;
    year = year | id;
    sum = sumc(year);
endfor;

Questions:
1.how to use the empty set to collect the results: Why do I need to use it and where to put it?
2.for the nested for loops, how to see if the original matrix has been put into groups based on my criteria?
3. where to put sum up inside loop?
4. is it possible to do "grouping and summing up" using matrix multiplication directly without using for loops?

5. an extra question please. I'm using Gauss 15, but when I get an error message I cannot find the line number that indicating the faulty line in my codes. Could you please tell me where to find it?

Could you help to correct the codes and make it work, and give any advice on the above 5 questions please?

Thank you very much!

4 Answers



0



Here is another way to accomplish what you are trying to do that only uses one for loop and only iterates one time for each row with the same year and id. I added some extra rows to provide a more thorough check of the algorithm:

new;
cls;
a = { 1 2 7 50,
      3 2 2 80,
      1 4 5 60,
      3 2 9 100,
      4 8 2 90,
      3 2 8 40, 
      1 2 3 50 };

print "original a = ";
print a;

//Sort 'a' based upon the first column
//and then sort items with the same element
//in the first column by their second
//column element
a = sortmc(a, 1 | 2);
print "sorted a = ";
print a;

//Find elements in which the elements
//in the 1st and 2nd column are the same
idx = a[.,1] .== lag1(a[.,1]);
idx = idx .and (a[.,2] .== lag1(a[.,2]));

//Convert logical vector to numeric indices
idx = indexcat(idx, 1);
print "rows to be consolidated are: ";
print idx;

//Set year and id to 0 for rows
//marked for consolidation so
//we can simply add entire rows
a[idx,1:2] = zeros(rows(idx), 2);

//Consolidate rows, starting from the bottom
idx = rev(idx);
for i(1, rows(idx), 1);
    cur_row = idx[i];
    a[cur_row-1,.] = a[cur_row-1,.] + a[cur_row,.];
endfor;

//Remove the no longer needed rows
a = delrows(a, idx);
print "final a = ";
print a;

As to your last question about finding the line of the error, here is a screenshot of your code after I ran it initially. It shows a red stop sign in the margin next to the line of code with the error. The error message at the bottom shows the file name and the line number of the error as well. The blue line number is also a link that you can click to take you to the location of the error which is helpful if your code is a bit longer or uses multiple files.
GAUSS 15 error screenshot

I think the code snippet above may answer your initial questions. However, if you would still like answers to any of them, please post them.

aptech

1,773


0



Thank you very much for the demonstration. It works and is much neater than mine!

Can I have two more questions please?
1. Regarding the line indicator for the error message, I usually get something like, for example: "G0113 : Unclosed (" . However without anything indicating the line number, nor can I see that red alarm besides the faulty line. Do I need to adjust any settings to be able to see it, please?

2. This solution works perfectly well to solve my problem. But still would like to know if I can have some clarification on how to construct a nested for loops since I have also come across this problems in other codes. So, could you please correct my problematic codes just as another demonstration to help me understand my mistakes in using for loops. And if possible, briefly explaining how to use empty sets to collect the results in the corrected codes please.

Thank you very much for your help!



0



1. I asked and answered your first question in this separate post to make it easier for others to reference in the future.
2. Below is a version of your code that runs:

new;
cls;
a = { 1 2 7 50,
      3 2 2 80,
      1 4 5 60,
      3 2 9 100,
      4 8 2 90  };
print a;
yrmat = unique(a[.,1],1);
idmat = unique(a[.,2],1);

//Set 'year' and 'sum' to be empty matrices
sum = {};
year = {};
for i(1,rows(yrmat),1);
    tem1 = selif(a[.,2:cols(a)], a[.,1] .eq yrmat[i]);
    
    //Set 'id' to be an empty matrix
    id = {};
    
    for j(1,rows(idmat),1);
        tem2 = selif(tem1[.,2:cols(tem1)], tem1[.,1] .eq idmat[j] );
        //if no matches are found by 'selif'
        //'tem2' will return a scalar error code
        //'scalmiss' returns 1 if the input is a scalar error code
        if not scalmiss(tem2);
            id = id | tem2;
        endif;
    endfor;
    year = year | id;
    sum = sumc(year);
endfor;

The first error that I found in your code was this line:

tem2 = selif(tem1[.,2:cols(tem1)], tem1[.,] .eq idmat[j]);

The reference to tem1 was missing the column index and I changed it to:

tem2 = selif(tem1[.,2:cols(tem1)], tem1[.,1] .eq idmat[j]);

I also added a check to make sure that selif found a match before trying to concatenate on a new row. However, I think you main question was about concatenation in loops and empty matrices.
Growing an empty matrix in a loop
Let's start with some very, very basic code to illustrate the concept:

new;
for i(1, 10, 1);
   r = rndu(1, 1);
   tmp = tmp | r;
endfor;

This code will fail because, tmp is referenced before it is assigned to. Since we only want end up with a vector containing only the integers random numbers, we don't want to start tmp as a 0 like this:

new;
tmp = 0;
for i(1, 10, 1);
   r = rndu(1,1);
   tmp = tmp | r;
endfor;

To resolve this problem we have two options. We can either pre-allocate and empty array of the correct size and fill in the individual elements on each iteration like this:

new;
len = 10;
tmp = zeros(len, 1);
for i(1, len, 1);
   r = rndu(1,1);
   tmp[i] = r;;
endfor;

or we can set tmp to be an empty matrix and add on a new row at each iteration like this:

new;
tmp = {};
for i(1, 10, 1);
   r = rndu(1,1);
   tmp = tmp | r;
endfor;

The method that preallocates the array is the preferred solution, because assigning to an index in a vector or matrix is much faster than growing it during each iteration.

aptech

1,773


0



Thank you very much for the explanation!

Your Answer

4 Answers

0

Here is another way to accomplish what you are trying to do that only uses one for loop and only iterates one time for each row with the same year and id. I added some extra rows to provide a more thorough check of the algorithm:

new;
cls;
a = { 1 2 7 50,
      3 2 2 80,
      1 4 5 60,
      3 2 9 100,
      4 8 2 90,
      3 2 8 40, 
      1 2 3 50 };

print "original a = ";
print a;

//Sort 'a' based upon the first column
//and then sort items with the same element
//in the first column by their second
//column element
a = sortmc(a, 1 | 2);
print "sorted a = ";
print a;

//Find elements in which the elements
//in the 1st and 2nd column are the same
idx = a[.,1] .== lag1(a[.,1]);
idx = idx .and (a[.,2] .== lag1(a[.,2]));

//Convert logical vector to numeric indices
idx = indexcat(idx, 1);
print "rows to be consolidated are: ";
print idx;

//Set year and id to 0 for rows
//marked for consolidation so
//we can simply add entire rows
a[idx,1:2] = zeros(rows(idx), 2);

//Consolidate rows, starting from the bottom
idx = rev(idx);
for i(1, rows(idx), 1);
    cur_row = idx[i];
    a[cur_row-1,.] = a[cur_row-1,.] + a[cur_row,.];
endfor;

//Remove the no longer needed rows
a = delrows(a, idx);
print "final a = ";
print a;

As to your last question about finding the line of the error, here is a screenshot of your code after I ran it initially. It shows a red stop sign in the margin next to the line of code with the error. The error message at the bottom shows the file name and the line number of the error as well. The blue line number is also a link that you can click to take you to the location of the error which is helpful if your code is a bit longer or uses multiple files.
GAUSS 15 error screenshot

I think the code snippet above may answer your initial questions. However, if you would still like answers to any of them, please post them.

0

Thank you very much for the demonstration. It works and is much neater than mine!

Can I have two more questions please?
1. Regarding the line indicator for the error message, I usually get something like, for example: "G0113 : Unclosed (" . However without anything indicating the line number, nor can I see that red alarm besides the faulty line. Do I need to adjust any settings to be able to see it, please?

2. This solution works perfectly well to solve my problem. But still would like to know if I can have some clarification on how to construct a nested for loops since I have also come across this problems in other codes. So, could you please correct my problematic codes just as another demonstration to help me understand my mistakes in using for loops. And if possible, briefly explaining how to use empty sets to collect the results in the corrected codes please.

Thank you very much for your help!

0

1. I asked and answered your first question in this separate post to make it easier for others to reference in the future.
2. Below is a version of your code that runs:

new;
cls;
a = { 1 2 7 50,
      3 2 2 80,
      1 4 5 60,
      3 2 9 100,
      4 8 2 90  };
print a;
yrmat = unique(a[.,1],1);
idmat = unique(a[.,2],1);

//Set 'year' and 'sum' to be empty matrices
sum = {};
year = {};
for i(1,rows(yrmat),1);
    tem1 = selif(a[.,2:cols(a)], a[.,1] .eq yrmat[i]);
    
    //Set 'id' to be an empty matrix
    id = {};
    
    for j(1,rows(idmat),1);
        tem2 = selif(tem1[.,2:cols(tem1)], tem1[.,1] .eq idmat[j] );
        //if no matches are found by 'selif'
        //'tem2' will return a scalar error code
        //'scalmiss' returns 1 if the input is a scalar error code
        if not scalmiss(tem2);
            id = id | tem2;
        endif;
    endfor;
    year = year | id;
    sum = sumc(year);
endfor;

The first error that I found in your code was this line:

tem2 = selif(tem1[.,2:cols(tem1)], tem1[.,] .eq idmat[j]);

The reference to tem1 was missing the column index and I changed it to:

tem2 = selif(tem1[.,2:cols(tem1)], tem1[.,1] .eq idmat[j]);

I also added a check to make sure that selif found a match before trying to concatenate on a new row. However, I think you main question was about concatenation in loops and empty matrices.
Growing an empty matrix in a loop
Let's start with some very, very basic code to illustrate the concept:

new;
for i(1, 10, 1);
   r = rndu(1, 1);
   tmp = tmp | r;
endfor;

This code will fail because, tmp is referenced before it is assigned to. Since we only want end up with a vector containing only the integers random numbers, we don't want to start tmp as a 0 like this:

new;
tmp = 0;
for i(1, 10, 1);
   r = rndu(1,1);
   tmp = tmp | r;
endfor;

To resolve this problem we have two options. We can either pre-allocate and empty array of the correct size and fill in the individual elements on each iteration like this:

new;
len = 10;
tmp = zeros(len, 1);
for i(1, len, 1);
   r = rndu(1,1);
   tmp[i] = r;;
endfor;

or we can set tmp to be an empty matrix and add on a new row at each iteration like this:

new;
tmp = {};
for i(1, 10, 1);
   r = rndu(1,1);
   tmp = tmp | r;
endfor;

The method that preallocates the array is the preferred solution, because assigning to an index in a vector or matrix is much faster than growing it during each iteration.

0

Thank you very much for the explanation!


You must login to post answers.

Have a Specific Question?

Get a real answer from a real person

Need Support?

Get help from our friendly experts.

Try GAUSS for 14 days for FREE

See what GAUSS can do for your data

© Aptech Systems, Inc. All rights reserved.

Privacy Policy