merge matrices and drop observations

Question

Hi, Aptech

Refers to my last post /questions/application-of-mergeby-and-generate-varibales-functions/ , now I have a new question in terms of it.

I am sorry that I open a new question as it seems that noboby noticed that I had added something new to my old question.

Now I want to know, if I have datasets like:

countries = { A 300 120,
              B 280 110,
              C 675 134 };
individuals = { 1 A 33 1,
                2 B 25 0,
                3 B 36 0,
                4 D 66 1 };

How can I get ?

id     country  Age    sex    GDP  Inflation
 4      D        66      1     na    na
 3      C        na     na     675   134

This means I want to generate a new matrix only including unmatched observations.

Thank you very much!!!

3 Answers

Your Answer

aptech · Answer 1

Hello,

I am confused on one point. In your desired output, why does the second observation get an id of 3? From the two input matrices/datasets, I don't see how we get an id number for that second observation.

If you straighten me out on this point, we should be able to help out!

link

aptech

1,773

applegrass · Answer 2

Hi, Aptech

I am sorry that there was a mistake.

The matrix should be

id     country  Age    sex    GDP  Inflation
 4      D        66      1     na    na
 na     C        na     na     675   134

As there is no ID assigned to country C.

Thank you very much!

link

applegrass

5

aptech · Answer 3

This should do what you want.

 new;
countries = { A 300 120,
              B 280 110,
              C 675 134 };
individuals = { 1 A 33 1,
                2 B 25 0,
                3 B 36 0,
                4 D 66 1 };
num_vars = 6;
country_column = 1;
individual_column = 2;
 
merged_data = mergeUnique(countries, country_column, individuals, individual_column, num_vars);
 
//** country_column --- column to compare with 'individuals'
//** individual_column --- column to compare with 'countries'
//** num_vars --- total number of variables between both matrices
//** WARNING: This procedure assumes that the order of the variables:
//******* in 'countries' is: country, GDP, inflation
//******* in 'individuals' is: id, country, age, sex
proc (1) = mergeUnique(countries, country_column, individuals, individual_column, num_vars);
   local mask, country_unique, individuals_unique, country_out, individuals_out, final_out;
   mask = indcv(countries[.,country_column], individuals[.,individual_column]);
   country_unique = selif(countries, (mask .== error(0)));
    
   mask = indcv(individuals[.,individual_column], countries[.,country_column]);
   individuals_unique = selif(individuals, (mask .== error(0)));
    
   country_out = reshape(error(0), rows(country_unique), num_vars);
   country_out[.,2] = country_unique[.,1];
   country_out[.,5:6] = country_unique[.,2:3];
    
   individuals_out = individuals_unique~reshape(error(0), rows(individuals_unique), num_vars-cols(individuals_unique));
    
   final_out = individuals_out | country_out;
   retp(final_out);
endp;

link

aptech

1,773

aptech · Answer 4

Hello,

I am confused on one point. In your desired output, why does the second observation get an id of 3? From the two input matrices/datasets, I don't see how we get an id number for that second observation.

If you straighten me out on this point, we should be able to help out!

link

aptech

1,773

applegrass · Answer 5

Hi, Aptech

I am sorry that there was a mistake.

The matrix should be

id     country  Age    sex    GDP  Inflation
 4      D        66      1     na    na
 na     C        na     na     675   134

As there is no ID assigned to country C.

Thank you very much!

link

applegrass

5

aptech · Answer 6

This should do what you want.

 new;
countries = { A 300 120,
              B 280 110,
              C 675 134 };
individuals = { 1 A 33 1,
                2 B 25 0,
                3 B 36 0,
                4 D 66 1 };
num_vars = 6;
country_column = 1;
individual_column = 2;
 
merged_data = mergeUnique(countries, country_column, individuals, individual_column, num_vars);
 
//** country_column --- column to compare with 'individuals'
//** individual_column --- column to compare with 'countries'
//** num_vars --- total number of variables between both matrices
//** WARNING: This procedure assumes that the order of the variables:
//******* in 'countries' is: country, GDP, inflation
//******* in 'individuals' is: id, country, age, sex
proc (1) = mergeUnique(countries, country_column, individuals, individual_column, num_vars);
   local mask, country_unique, individuals_unique, country_out, individuals_out, final_out;
   mask = indcv(countries[.,country_column], individuals[.,individual_column]);
   country_unique = selif(countries, (mask .== error(0)));
    
   mask = indcv(individuals[.,individual_column], countries[.,country_column]);
   individuals_unique = selif(individuals, (mask .== error(0)));
    
   country_out = reshape(error(0), rows(country_unique), num_vars);
   country_out[.,2] = country_unique[.,1];
   country_out[.,5:6] = country_unique[.,2:3];
    
   individuals_out = individuals_unique~reshape(error(0), rows(individuals_unique), num_vars-cols(individuals_unique));
    
   final_out = individuals_out | country_out;
   retp(final_out);
endp;

link

aptech

1,773

merge matrices and drop observations

3 Answers

Your Answer

3 Answers

You must login to post answers.

Have a Specific Question?

Need Support?

	new;
	countries = { A 300 120,
	B 280 110,
	C 675 134 };
	individuals = { 1 A 33 1,
	2 B 25 0,
	3 B 36 0,
	4 D 66 1 };
	num_vars = 6;
	country_column = 1;
	individual_column = 2;

	merged_data = mergeUnique(countries, country_column, individuals, individual_column, num_vars);

	//** country_column --- column to compare with 'individuals'
	//** individual_column --- column to compare with 'countries'
	//** num_vars --- total number of variables between both matrices
	//** WARNING: This procedure assumes that the order of the variables:
	//******* in 'countries' is: country, GDP, inflation
	//******* in 'individuals' is: id, country, age, sex
	proc (1) = mergeUnique(countries, country_column, individuals, individual_column, num_vars);
	local mask, country_unique, individuals_unique, country_out, individuals_out, final_out;
	mask = indcv(countries[.,country_column], individuals[.,individual_column]);
	country_unique = selif(countries, (mask .== error(0)));

	mask = indcv(individuals[.,individual_column], countries[.,country_column]);
	individuals_unique = selif(individuals, (mask .== error(0)));

	country_out = reshape(error(0), rows(country_unique), num_vars);
	country_out[.,2] = country_unique[.,1];
	country_out[.,5:6] = country_unique[.,2:3];

	individuals_out = individuals_unique~reshape(error(0), rows(individuals_unique), num_vars-cols(individuals_unique));

	final_out = individuals_out \| country_out;
	retp(final_out);
	endp;