Resources

merge matrices and drop observations

0

Hi, Aptech

Refers to my last post http://www.aptech.com/questions/application-of-mergeby-and-generate-varibales-functions/ , now I have a new question in terms of it.

I am sorry that I open a new question as it seems that noboby noticed that I had added something new to my old question.

Now I want to know, if I have datasets like:

countries = { A 300 120,
              B 280 110,
              C 675 134 };
individuals = { 1 A 33 1,
                2 B 25 0,
                3 B 36 0,
                4 D 66 1 };

How can I get ?

id     country  Age    sex    GDP  Inflation
 4      D        66      1     na    na
 3      C        na     na     675   134

This means I want to generate a new matrix only including unmatched observations.

Thank you very much!!!

Tags: asked August 11, 2014

3 Answers

0

Hello,

I am confused on one point. In your desired output, why does the second observation get an id of 3? From the two input matrices/datasets, I don't see how we get an id number for that second observation.

If you straighten me out on this point, we should be able to help out!

aptech
342
0

Hi, Aptech

I am sorry that there was a mistake.

The matrix should be

id     country  Age    sex    GDP  Inflation
 4      D        66      1     na    na
 na     C        na     na     675   134

As there is no ID assigned to country C.

 

Thank you very much!

0

This should do what you want.

new;
countries = { A 300 120,
              B 280 110,
              C 675 134 };
individuals = { 1 A 33 1,
                2 B 25 0,
                3 B 36 0,
                4 D 66 1 };
num_vars = 6;
country_column = 1;
individual_column = 2;

merged_data = mergeUnique(countries, country_column, individuals, individual_column, num_vars);

//** country_column --- column to compare with 'individuals'
//** individual_column --- column to compare with 'countries'
//** num_vars --- total number of variables between both matrices
//** WARNING: This procedure assumes that the order of the variables:
//******* in 'countries' is: country, GDP, inflation
//******* in 'individuals' is: id, country, age, sex
proc (1) = mergeUnique(countries, country_column, individuals, individual_column, num_vars);
   local mask, country_unique, individuals_unique, country_out, individuals_out, final_out;
   mask = indcv(countries[.,country_column], individuals[.,individual_column]);
   country_unique = selif(countries, (mask .== error(0)));
    
   mask = indcv(individuals[.,individual_column], countries[.,country_column]);
   individuals_unique = selif(individuals, (mask .== error(0)));
    
   country_out = reshape(error(0), rows(country_unique), num_vars);
   country_out[.,2] = country_unique[.,1];
   country_out[.,5:6] = country_unique[.,2:3];
    
   individuals_out = individuals_unique~reshape(error(0), rows(individuals_unique), num_vars-cols(individuals_unique));
    
   final_out = individuals_out | country_out;
   retp(final_out);
endp;
aptech
342