Hi, Aptech
Refers to my last post /questions/application-of-mergeby-and-generate-varibales-functions/ , now I have a new question in terms of it.
I am sorry that I open a new question as it seems that noboby noticed that I had added something new to my old question.
Now I want to know, if I have datasets like:
countries = { A 300 120, B 280 110, C 675 134 }; individuals = { 1 A 33 1, 2 B 25 0, 3 B 36 0, 4 D 66 1 };
How can I get ?
id country Age sex GDP Inflation 4 D 66 1 na na 3 C na na 675 134
This means I want to generate a new matrix only including unmatched observations.
Thank you very much!!!
3 Answers
0
Hello,
I am confused on one point. In your desired output, why does the second observation get an id of 3? From the two input matrices/datasets, I don't see how we get an id number for that second observation.
If you straighten me out on this point, we should be able to help out!
0
Hi, Aptech
I am sorry that there was a mistake.
The matrix should be
id country Age sex GDP Inflation 4 D 66 1 na na na C na na 675 134
As there is no ID assigned to country C.
Thank you very much!
0
This should do what you want.
new; countries = { A 300 120, B 280 110, C 675 134 }; individuals = { 1 A 33 1, 2 B 25 0, 3 B 36 0, 4 D 66 1 }; num_vars = 6; country_column = 1; individual_column = 2; merged_data = mergeUnique(countries, country_column, individuals, individual_column, num_vars); //** country_column --- column to compare with 'individuals' //** individual_column --- column to compare with 'countries' //** num_vars --- total number of variables between both matrices //** WARNING: This procedure assumes that the order of the variables: //******* in 'countries' is: country, GDP, inflation //******* in 'individuals' is: id, country, age, sex proc (1) = mergeUnique(countries, country_column, individuals, individual_column, num_vars); local mask, country_unique, individuals_unique, country_out, individuals_out, final_out; mask = indcv(countries[.,country_column], individuals[.,individual_column]); country_unique = selif(countries, (mask .== error(0))); mask = indcv(individuals[.,individual_column], countries[.,country_column]); individuals_unique = selif(individuals, (mask .== error(0))); country_out = reshape(error(0), rows(country_unique), num_vars); country_out[.,2] = country_unique[.,1]; country_out[.,5:6] = country_unique[.,2:3]; individuals_out = individuals_unique~reshape(error(0), rows(individuals_unique), num_vars-cols(individuals_unique)); final_out = individuals_out | country_out; retp(final_out); endp;
Your Answer
3 Answers
Hello,
I am confused on one point. In your desired output, why does the second observation get an id of 3? From the two input matrices/datasets, I don't see how we get an id number for that second observation.
If you straighten me out on this point, we should be able to help out!
Hi, Aptech
I am sorry that there was a mistake.
The matrix should be
id country Age sex GDP Inflation 4 D 66 1 na na na C na na 675 134
As there is no ID assigned to country C.
Thank you very much!
This should do what you want.
new; countries = { A 300 120, B 280 110, C 675 134 }; individuals = { 1 A 33 1, 2 B 25 0, 3 B 36 0, 4 D 66 1 }; num_vars = 6; country_column = 1; individual_column = 2; merged_data = mergeUnique(countries, country_column, individuals, individual_column, num_vars); //** country_column --- column to compare with 'individuals' //** individual_column --- column to compare with 'countries' //** num_vars --- total number of variables between both matrices //** WARNING: This procedure assumes that the order of the variables: //******* in 'countries' is: country, GDP, inflation //******* in 'individuals' is: id, country, age, sex proc (1) = mergeUnique(countries, country_column, individuals, individual_column, num_vars); local mask, country_unique, individuals_unique, country_out, individuals_out, final_out; mask = indcv(countries[.,country_column], individuals[.,individual_column]); country_unique = selif(countries, (mask .== error(0))); mask = indcv(individuals[.,individual_column], countries[.,country_column]); individuals_unique = selif(individuals, (mask .== error(0))); country_out = reshape(error(0), rows(country_unique), num_vars); country_out[.,2] = country_unique[.,1]; country_out[.,5:6] = country_unique[.,2:3]; individuals_out = individuals_unique~reshape(error(0), rows(individuals_unique), num_vars-cols(individuals_unique)); final_out = individuals_out | country_out; retp(final_out); endp;