Resources

application of Mergeby and generate varibales functions

0

Hi,

I am working on the panel data now, therefore, sometimes I need to merge two datasets together.

I have no problem with imporing data from excel to GAUSS by using the code:

Household_a120c=xlsReadM("Household_a120c.xls","A2:G624",1,"")

As we know that, by doing so I will lost table look of this data, it has been transformed into a matrix. It is a little bit annoying with respect to panel data. Therefore, I tried to assign variable names for the matrix like this

let xnames = ahhrhid ahifdip ahifdin ahh0_4 ahh5_9 ahh10_14 ahhadult;

let vnames = ahifdip ahifdin ahh0_4 ahh5_9 ahh10_14 ahhadult;

makevars(household_a120c, vnames, xnames);

let xnames_eperson_a120 = xwaveid ahhrhid;

let vnames_eperson_a120 = xwaveid ahhrhid;

makevars(eperson_a120, vnames_eperson_a120, xnames_eperson_a120);

However, the command above just generate vectors for me, but after that, I think I may use the mergeby function as I have already created variable names, then I did this

mergeby(household_a120c,eperson_a120,merge,ahhrhid); // ahhrhid is the ID I want to use to combine these two data sets.

Honestly, I do not think I have applied mergeby function correctly, so could you please tell how I should use it properly. Except from Mergeby function, is there any other method I can use to merge two data sets.

Thank you very much!

asked July 17, 2014

6 Answers

0

Please forgive me, I am having trouble understanding exactly what you want to do.

Do you want the merged data to be a file on disk? If so, do you want a GAUSS dataset (.dat) file, or do you want it to be an Excel (.xls, or .xlsx) file?

aptech
342
0

Hi,

I Want the merged data to be an excel file.

Thank you!

0

Hi, Maybe I need to clarify my questiuon again.

For instance, if I have

A=[ 3 3 2; 2 4 5; 8 9 11]// this is a 3*3 matrix, suppose the first column is the ID

 

B=[2 5 7; 8 10 12; 8 9 11;3 4 6;5 7 9]// this is a 5*3 matrix, the first column is the ID AS WELL.
Basically , what I want to do is to combine matrix A and matrix B according to their IDs in
the first column, I tried to use MERGEBY function, obviously it did not work.
Can you help me out with this?
Thank you very much!

			
0

Hi Aptech;

I am sorry that I post so many useless words here, I have tried serveral methods to merge two dataset, unfortunately, none of them work me at all.

Here's my question:

suppose my original datasets in the form of excel files(. xls)

My first data set(called countries) is:

Country  GDP  inflation
A        300   120
B        280   110
C        675   134

My second data set( called individuals) is:

id     country  Age    sex
 1      A      33      1
 2      B      25      0
 3      B      36      0
 4      D      66      1

I'd like to achieve the combined data set that look like this

id     country  Age    sex    GDP  Inflation
 1      A      33      1     300   120
 2      B      25      0     280   110
 3      B      36      0     280   110

Therefore the unmatched observations will be eliminated.

I tried to use mergeby function as follows:

As I am using the .xls file, I first write them into .gda files like

For data set 1,

ret = gdaCreate("countries.gda",1);
ret = gdaWrite("countries.gda",countries,"country GDP inflation");

For data ste 2:

ret = gdaCreate("individuals.gda",1);
ret = gdaWrite("individuals.gda",individuals,"id age sex");

Then:

ret = gdaCreate("merge_1.gda",1);// create a null file

mergeby(individuals,countries,"","");

But, I have got error message

 
G0085 : Invalid file type C:\gauss14\eperson_a120.gda 'C:\gauss14\eperson_a120.gda' [sortd.srcline 431]

I msut make some mistakes here, but I did figure out what i did wrong,

Thank you!

 

 

0

Ok, I understand now. The problems you are running into are because you are using functions made for different file types. I think you will be best off to load the entire dataset in as a GAUSS matrix to do the merging and then write it to the updated data to the Excel file. Here is a procedure that will merge the matrices:

countries = { A 300 120,
              B 280 110, 
	      C 675 134 };
individuals = { 1 A 33 1, 
                2 B 25 0, 
	        3 B 36 0, 
	        4 D 66 1 };

data_full = mergeMatrices(individuals, 2, countries);

print_mask = { 1 0 1 1 1 1 };
ret = printfmt(data_full, print_mask);

//This procedure assumes:
//  1) individuals has as many rows (or more) than countries
//  2) You want to sort based upon the first column
//   of 'countries' and the 
//   'sort_col' column of 'individuals'
proc (1) = mergeMatrices(individuals, sort_col, countries);
   local idx, individuals_new;
   //Mask, permutation matrix
   //that maps 'individuals' to 'countries'
   idx = (individuals[.,sort_col] .== countries[.,1]');

   //strip unmatched entries
   individuals = selif(individuals, sumr(idx));
   idx = selif(idx, sumr(idx));

   //use idx as permutation matrix
   //to create new columns
   individuals_new = idx*countries[.,2:cols(countries)];
		
   //Horizontaly concatenate
   //new variables onto 'individuals'
   individuals = individuals~individuals_new;
   retp(individuals);
endp;

First make sure you can use this to arrange the data in the manner that you expect. Post any questions that come up. If you have questions about how to save this new data to an Excel file (since it is a mix of character and numeric data), please post that as a new question and we will happily assist you.

aptech
342
0

Hi Aptech

Thank you very much! the codes provided by you is just fatanstic!

I have a new question related to this topic now, what if I want to do something like this

 
countries = { A 300 120,               
              B 280 110,       
              C 675 134 };
individuals = { 1 A 33 1,                 
                2 B 25 0,         
                3 B 36 0,         
                4 D 66 1 };

now I only want to create the matrix including different observations, like

id     country  Age    sex    GDP  Inflation
4      D        66      1     na    na
3      C        na     na     675   134

what should I do to get the above?

Thank you very much !