UNDERSTANDING A GAUSS CODE

Question

I am new to GAUSS. I am tryying to understand the GAUSS code below so that I can replicate it using my own data without any success. Can somebody explain to me step by step what the code implies?

new; format /rd /m1 8,4; output file= invest.out on;
tstart = date;
load invest; // Load data and define variables

t = 15;
nt = rows(invest);
n = nt/t;

y = invest[.,1]; @ investment/assets @
Tq = invest[.,2]; @ Tobin's Q @
c = invest[.,3]; @ cash-flow/assets @
d = invest[.,4]; @ debt/assets @

5 Answers

Your Answer

aptech · Answer 1

Both the 'at' symbol, @ and // are comments. I have added comments to the code to explain each line.

//Clear all matrices, strings etc from
//your GAUSS workspace
new; 

//Make all printed data in decimal form with
//4 places of precision and 8 places to allow
//for padding between numbers
format /rd /m1 8,4; 

//Send all printed output to the file
//'invest.out' in your GAUSS working directory
output file = invest.out on;

//Set 'tstart' equal a 4x1 vector with current date
//YYY
//MM
//DD
//HSEC (n hundredths of a second since midnight)
tstart = date;

//Load data from a file named 'invest'
//located in your current working directory.
//The data is assumed to be a text file without headers
//NOTE: Your probably do not want to load your data this way
//see other options after code
load invest; // Load data and define variables

t = 15;
nt = rows(invest);
n = nt/t;

//Assign 'y', 'Tq', 'c' and 'd'
//from the 1st, 2nd, 3rd and 4th
//columns of the matrix 'invest'
y = invest[.,1]; @ investment/assets @
Tq = invest[.,2]; @ Tobin's Q @
c = invest[.,3]; @ cash-flow/assets @
d = invest[.,4]; @ debt/assets @

You really do not want to use the load command as mentioned in the code above. If your data is either

A CSV or Excel file with headers.
GAUSS, SAS or STATA dataset.

you should use the loadd command. For example, if your file has 4 variables which are in the order specified by the code above, then

//Change this to your actual file name
fname = "myfile.xlsx";

//Load all variables from file
invest = loadd(fname);

Alternatively, if you have more variables in the file, or if they are in different order, then

//Change this to your actual file name
fname = "myfile.dta";

//Load variables by name
//Change the variable names to match what is in your file
invest = loadd(fname, "investment + Q + cash flow + debt");

The tutorials linked to here will explain more about file loading with formula strings. Let us know if you have more questions about the code.

link

aptech

1,773

Ogbarooye · Answer 2

Thank you. I will go through your explanation. I will let you know in case I need further help.

link

Ogbarooye

0

Ogbarooye · Answer 3

Thank you for helping me to solve my previous problem. However, I have encountered another problem in continuation of understanding the full code I was studying for the purposing of replicating it using my own data. The full code is as follows:

*/

new; format /rd /m1 8,4; output file= invest.out on;
tstart = date;
load invest; // Load data and define variables

t = 15;
nt = rows(invest);
n = nt/t;

y = invest[.,1]; @ investment/assets @
Tq = invest[.,2]; @ Tobin's Q @
c = invest[.,3]; @ cash-flow/assets @
d = invest[.,4]; @ debt/assets @

/***************************** Controls *******************************/

qn = 200; @ number of quantiles to examine @
_trim_1 = .15; @ percentage to trim before search, single @
_con = 0; if _con ==0; "no constant"; elseif _con==1;
"constant in the upper regime";endif;
ns = 200; @ iter num in averaging (1st uses analytic formula) @
nb = 0; @ bootstrap iteration @
jmy = 1; @ number of lags of y used in IV ( jm >= 1) @
jmx = 1; @ number of lags of x used in IV ( jm >= 1) @
jn = 1; @ initial lag used in IV is t-jn-2 (jn >=0) @
t0 = jn+3; @ initial time in GMM ( >= jn+3) @
jj=1000; @ iteration for linearity test p-value @

x = c~Tq~d; @ set regressors @
// tt= ones(n,1).*.eye(t); tt= tt[.,t0+1:t]; // time dummies. SET _con =0
// x = x~tt; with dummies we need more IV's as they are already part of IVs

q = c;

xx= ones(nt,1)~tq; @ variables to be used as IV: FIRST COLUMN MUST BE 1's@

"threshold variable = c ";
"qn~ns~jm~jn~t0 : " qn~ns~jmy~jn~t0;

/*************************************************************************/

w = setiv(n,t,y,xx,t0,jmy,jmx,jn);" number of moments: " cols(w);

dd = unique(q,1);
qnt1 = qn*_trim_1;
sq = seqa(_trim_1,1/qn,qn-2*qnt1+1);
qq1 = dd[floor(sq*rows(dd))]; @ grid for threshold values @
qn1 = rows(qq1);

ly = vec(lagn(reshape(y,n,t)',1)); // ntx1
dy = y-ly; // ntx1
z2 = ly;
k=cols(x); z2 = ly~x;
z1 = z2;
if _con == 1; z1= (z2~ones(n*t,1)); endif;
k1 = cols(z1); k2 = cols(z2);

dz0 = {} ; @ stack of dz's across threshold values @
for i (1,qn1,1);

z = z2~(z1.*(q.> qq1[i])); // ntx[(1+k)+(1+k)+1]
lz = lagnmatrix(n,t,z,1); // ntx[(1+k)+(1+k)]
dz = z-lz; // ntx[(1+k)+(1+k)]
tmp = packr(dy~dz); // n(t-2)x[1+(1+k)+(1+k)]
tmp = trimrmatrix(n,t-2,tmp,t0-3,0);
dz = tmp[.,2:cols(tmp)]; // n(t-t0+1)x(1+k+1+1+k)
dz0=dz0|dz;

endfor; dy = tmp[.,1];

// ************************** 1st step GMM

zst = rndn(cols(w),jj); // random numbers for linearity test
thtsample = {}; Jsample={};
wnst_A = zeros(qn1,jj); wn_A = zeros(qn1,1);
colp = zeros(ns,1);

for iter (1,ns,1);

if iter == 1; iv = weight1(n,t,w,t0); else;
e = rndn(t-1,n); // (t-1)xn, simulating homogeneous case
de = vec(e[2:t-1,.]-e[1:t-2,.]); // n(t-2)x1
iv = weight(n,t,de,w,t0);
endif;

col1={};
for i (1,qn1,1);

dz = dz0[(i-1)*n*(t-t0+1)+1:i*n*(t-t0+1),.];
wdz = (w'dz)/n; // mx(1+k+1+1+k)
wdy = (w'dy)/n; // mx1
trap 1; iwdz = inv(wdz'*iv*wdz); trap 0;
if scalerr(iwdz); iwdz = invswp(wdz'*iv*wdz);
" error in the 1st step " i; endif;

b = iwdz*(wdz'*iv*wdy); // (1+k+1+k)x1
wde = wdy-wdz*b; // mx1
s = wde'*iv*wde; // scalar
dehat = dy-dz*b; // n(t-2)x1

col1=col1|(s~qq1[i]~b'~dehat');

endfor;

col1hat=sortc(col1,1);
dehat = col1hat[1,2+rows(b)+1:cols(col1)]';

// ************************** 2nd step GMM

iv = weight(n,t,dehat,w,t0);
wnst=zeros(qn1,jj);
wn = zeros(qn1,1);

col2={};
for i (1,qn1,1);

dz = dz0[(i-1)*n*(t-t0+1)+1:i*n*(t-t0+1),.];
wdz = (w'dz)/n; // mx(1+k+1+1+k)
wdy = (w'dy)/n; // mx1
trap 1; iwdz = inv(wdz'*iv*wdz); trap 0;
if scalerr(iwdz); iwdz = invswp(wdz'*iv*wdz);
" error in the 2nd step " i; endif;
b = iwdz*(wdz'*iv*wdy); // (1+k+1+k)x1

wde = wdy-wdz*b; // mx1
s = wde'*iv*wde; // scalar
dehat = dy-dz*b; // n(t-2)x1

col2=col2|(s~qq1[i]~b'~dehat');

//-- supW statistic
p1 = zeros(k1,k2)~eye(k1);
ivi = weight(n,t,col1[i,2+rows(b)+1:cols(col1)]',w,t0);
iv2 = chol(ivi);
trap 1; iwdzi = inv(wdz'*ivi*wdz); trap 0;
if scalerr(iwdzi); iwdzi = invswp(wdz'*ivi*wdz);
" error in the 2nd step " i; endif;

idti = inv(p1*iwdzi*p1');
wn[i] = n*(p1*b)'idti*(p1*b);

for j (1,jj,1);
wnst[i,j] = zst[.,j]'iv2*wdz*iwdzi*p1'idti*p1*iwdzi*wdz'iv2'zst[.,j];
endfor;

endfor;

col2hat=sortc(col2,1);
dehat = col2hat[1,2+rows(b)+1:cols(col2hat)]';

thtsample = thtsample|col2hat[1,2:2+rows(b)];
Jsample = Jsample|(n*col2hat[1,1]);

colp[iter] = 1-counts(maxc(wnst),maxc(wn))/jj; //linearity test
wnst_A = wnst_A + wnst; wn_A = wn_A + wn;

endfor;

ththat = meanc(thtsample); // averaged estimate, (1+k+1+k)x1
ththat1= thtsample[1,.]'; // estimate from analytic Weight

Jhat = meanc(Jsample); // averaged J statistic
Jhat1= Jsample[1]; // J stat from analytic Weight

ltp_A = 1-counts(maxc(wnst_A),maxc(wn_A))/jj;
"";" linearity test from analytic : " colp[1];
" linearity test from averaging: " ltp_A; "";

""; "/------ output based on analytic variance formula -----/";"";;
{dum1,dum2,dum3,dum4}=printout(n,t,t0,w,dy,q,ththat1,z1,z2,Jhat1,_con);
//dum1: s.e.; dum2: upper regime estimates and s.e.;
//dum3: long-run parameter estimates and s.e. dum4: Jstat and p-value

if ns > 1; ""; "/------ in case of averaging -----/";"";;
{dum1,dum2,dum3,dum4}=printout(n,t,t0,w,dy,q,ththat,z1,z2,Jhat,_con);
endif;

tend = date;et = etstr(ethsec(tstart,tend)); "";" elapsed time : " et;
end;

/*****************************************************
input :

t0: initial time for the moment condition
jm's: maximum lags to be used as an IV
x: should contain ones in the first column

*******************************************************/

proc setiv(n,t,y,x,t0,jmy,jmx,jn);
local er,er1,j,k,m,i,zyi,zxi,wi,sr,w,sj,mj;

k = cols(x); // x[.,1]=1
w = {};
for i (1,n,1);
zyi = y[(i-1)*(t)+1:i*(t)]; // (t)x1
zxi = x[(i-1)*(t)+1:i*(t),.]; // (t)xk
wi = {}; // (t-t0+1)xm
er = 0; er1=0; // scalar

for j (1,t-t0+1,1);
sj = j+t0-jn-3;
mj = 1+ minc(jmy|sj);
wi = wi~zeros(t-t0+1,mj);
sr = er+1; // scalar
er1 = er+ mj; // scalar
er = er1;
wi[j,sr:er1]=1~zyi[maxc(1|sj-jmy+1):sj]';

if k > 1;
mj = (-1+k)*minc(jmx|sj);
wi = wi~zeros(t-t0+1,mj);
er = er1+ mj; // scalar
wi[j,er1+1:er]= vec(zxi[maxc(1|sj-jmx+1):sj,2:k])';
endif;
endfor;

w = w|wi; // (t-t0+1)xm
endfor;
retp(w); endp;

/* input:
de: residuals
w : IV's
*/

proc weight(n,t,de,w,t0); // with given residuals
local v,vbar,i,dei,wi,iv;

v = 0; // scalar
vbar = 0;

for i (1,n,1);
dei = de[(i-1)*(t-t0+1)+1:i*(t-t0+1),.]; // (t-2)x1
wi = w[(i-1)*(t-t0+1)+1:i*(t-t0+1),.]'dei; // m x 1
v = v + wi*wi'; // mxm
vbar = vbar + wi; // mx1

endfor;
if rank(v/n-(vbar/n)*(vbar/n)') == cols(w);
iv = inv(v/n-(vbar/n)*(vbar/n)'); // mxm
else; "error 2";
iv = invswp(v/n-(vbar/n)*(vbar/n)');
endif;

retp(iv); endp;

proc weight1(n,t,w,t0); //analytic
local m,v,i,j,wi,wj,iv;

m = cols(w); // number of moments
v = zeros(m+2,m+2);

for i (1,n,1);
wi = zeros(t-t0+3,m+2);
wi[2:t-t0+2,2:m+1] = w[(i-1)*(t-t0+1)+1:i*(t-t0+1),.];

for j (1,t-t0+2,1);
wj = wi[j,.]'-wi[j+1,.]';
v = v + wj*wj'; // mxm
endfor;
endfor;

v = v[2:(m+1),2:(m+1)];
if rank(v/n) == cols(w);
iv = inv(v/n); // mxm
else; "error";
iv = invswp(v/n);
endif;

retp(iv); endp;

// ===========================================================================
proc (1) = lagnmatrix(n,t,x,p);
local k,lx,i;
k = cols(x);
lx = zeros(n*t,k);
for i (1,k,1);
lx[.,i] = vec(lagn(reshape(x[.,i],n,t)',p)); // ntx1
endfor;

retp(lx); endp;

proc (1) = trimrmatrix(n,t,x,p,q);
local k,y,i;
k = cols(x);
y = zeros(n*(t-p-q),k);
for i (1,k,1);
y[.,i] = vec(trimr(reshape(x[.,i],n,t)',p,q));
endfor;
retp(y); endp;

// ===========================================================================
//outputs:
//dum1: s.e.; dum2: upper regime estimates and s.e.;
//dum3: long-run parameter estimates and s.e. dum4: Jstat and p-value

proc (4) = printout(n,t,t0,w,dy,q,ththat1,z1,z2,Jhat1,_con);
local dumm1,z,lz,dz,tmp,dehat1,iv,Gb,h,kn,s,ds,i,si,Gr,
sb1,sg1,se1,k2,p,tht2hat,se2,bt1,phi1,p1,bse1,bt2,phi2,p2,bse2,bt,bse;
// *********************************** COVARIANCE MATRIX
dumm1 = (q.> ththat1[1]); " proportion of upper regime: " meanc(dumm1);"";
z = z2~(z1.*dumm1); // ntx[(1+k)+(1+k)+1]
lz = lagnmatrix(n,t,z,1); // ntx[(1+k)+(1+k)]
dz = z-lz; // ntx[(1+k)+(1+k)]
tmp = packr(dz); // n(t-2)x[1+(1+k)+(1+k)]
tmp = trimrmatrix(n,t-2,tmp,t0-3,0);
dz = tmp[.,1:cols(tmp)]; // n(t-t0+1)x(1+k+1+1+k)
dehat1 = dy-dz*ththat1[2:rows(ththat1)]; // n(t-t0+1)x1

iv = weight(n,t,dehat1,w,t0);

Gb = -w'dz/n; // mx[(1+k)+(1+k)]

// *********************************** ds: n(t-2)x(1+k)
h = 1.06*stdc(packr(q))*n^(-0.2); // scalar, bandwidth
kn = exp(-(((ththat1[1]-q)/h).^2)/2)/sqrt(2*pi); // ntx1
s = z1.*kn; // ntx(1+k)
ds = zeros(n*(t-t0+1),cols(s)); // n(t-2)x(1+k)
for i (1,cols(s),1);
si = reshape(s[.,i],n,t)'; // txn
ds[.,i] = vec(trimr(si,t0-1,0)-trimr(si,t0-2,1)); // (t-2)x(1+k)
endfor;

Gr = -(w'ds/(n*h))*ththat1[3+k:rows(ththat1)]; // mx1

// *********************************** STANDARD ERROR
sb1 = inv(Gb'*iv*Gb-(Gb'*iv*Gr)*inv(Gr'*iv*Gr)*(Gr'*iv*Gb))/n;
sg1 = sqrt(1/(n*(Gr'*iv*Gr-(Gr'*iv*Gb)*inv(Gb'*iv*Gb)*(Gb'*iv*Gr))));
se1 = sg1|sqrt(diag(sb1));

" estimates and s.e. (threshold~lower regime (lag y,x)~delta (1,lag y,x)";
ththat1'|se1';

"";" upper regime estimates and s.e. (lag y,x)";

k2 = cols(z2);
if _con ==1; p = eye(k2)~zeros(k2,1)~eye(k2);
else; p = eye(k2)~eye(k2); endif;
tht2hat = (zeros(k2,1)~p)*ththat1;
se2 = p*sb1*p';
tht2hat'|sqrt(diag(se2))';

"";" Long-run parameter estimates and s.e.";

bt1 = ththat1[3:k2+1]; phi1=ththat1[2];
p1 = (eye(k2-1)~(bt1./(1-phi1)))./(1-phi1);
bse1=p1*sb1[2:k2+1,2:k2+1]*p1';

bt2 = tht2hat[2:rows(tht2hat)]; phi2=tht2hat[1];
p2 = (eye(k2-1)~(bt2/(1-phi2)))/(1-phi2);
bse2=p2*se2*p2';

bt = bt1|bt2; bse = sqrt(diag(bse1))|sqrt(diag(bse2));
bt'|bse';

"";" Overidentification J-statistic (p-value) " ;
Jhat1~cdfchic(Jhat1,cols(W)-rows(ththat1));

retp(se1,tht2hat~sqrt(diag(se2)),bt~bse,Jhat1~cdfchic(Jhat1,cols(W)-rows(ththat1)));
endp;

I have encountered some problems while trying to replicate the code above using my own data. Where I got stuck is as follows:

» new; format /rd /m1 8,4; output file= yield.out on;    Line 1
» tstart = date;            Line 2
» fname="datalist.xlsx";   Line 3
» range="a2:d821";       Line 4
» sheet=1;             Line 5
» mypools=spreadSheetReadM(fname,range,sheet);         Line 6
» t = 20;       Line 7
» nt = rows(mypools);        Line 8
» n = nt/t;        Line 9
» y = mypools[.,1];      Line 10
» iiq = mypools[.,2];      Line 11
» growth = mypools[.,3];   Line 12
» top = mypools[.,4];          Line 13
» qn = 200;      Line 14
» _trim_1=.15;   Line 15
» _con = 0; if _con ==0; "no constant"; elseif _con==1; "constant in the upper regime"; endif;     Line 16
no constant
» ns = 200;     Line 17
» nb = 0;    Line 18
» jmy = 1;    Line 19
» jmx = 1;    Line 20
» jn = 1;       Line 21
» t0 = jn+3;      Line 22
» jj=1000;     Line 23
» x = iiq~growth~top;    Line 24
» // tt= ones(n,1).*.eye(t); tt= tt[.,t0+1:t]; // time dummies. SET _con =0
// x = x~tt; with dummies we need more IV's as they are already part of IVs;    Line 25
» q = iiq;    Line 26
» xx= ones(nt,1)~top;   Line 27
» "threshold variable = iiq";    Line 28
threshold variable = iiq
» "qn~ns~jm~jn~t0 : " qn~ns~jmy~jn~t0;    Line 29
qn~ns~jm~jn~t0 : 200.0000 200.0000 1.0000 1.0000 4.0000
» w = setiv(n,t,y,xx,t0,jmy,jmx,jn);" number of moments: " cols(w);   Line 30

After writing Line 30, I got an error message which says "Undefined symbols: setiv". Please, what is the meaning of setiv?

Also, kindly help me check Line 24 if what I wrote there is actually correct compared to what is in the original code. I feel that the writer of the code wanted to write Tq in that line which is one of his regressors, and so I changed it to top which is one of my regressors. When I put tq in that line as you have it in the original code, I got an error message which says "Undefined symbols: tq". So, I feel the author wanted to write Tq. Kindly help me look at it also.

Thanks as I expect to hear from you soon. Note that I am using GAUSS Light.

link

Ogbarooye

0

aptech · Answer 4

Based on your last message, it appears that you are entering each line into GAUSS separately in the program input/output window. You should not do this. You should tell GAUSS to run the entire program file at once. Take a look at this getting started with GAUSS tutorial to see how to run a file and a few other helpful tips.

After the working through the tutorial, then you can open this code file in the GAUSS file editor by selecting File->Open from the main GAUSS menu and selecting your file.

link

aptech

1,773

Ogbarooye · Answer 5

Thank you for rising to my aid once again. I will look at the points you have raised and feed you back.

link

Ogbarooye

0

aptech · Answer 6

Both the 'at' symbol, @ and // are comments. I have added comments to the code to explain each line.

//Clear all matrices, strings etc from
//your GAUSS workspace
new; 

//Make all printed data in decimal form with
//4 places of precision and 8 places to allow
//for padding between numbers
format /rd /m1 8,4; 

//Send all printed output to the file
//'invest.out' in your GAUSS working directory
output file = invest.out on;

//Set 'tstart' equal a 4x1 vector with current date
//YYY
//MM
//DD
//HSEC (n hundredths of a second since midnight)
tstart = date;

//Load data from a file named 'invest'
//located in your current working directory.
//The data is assumed to be a text file without headers
//NOTE: Your probably do not want to load your data this way
//see other options after code
load invest; // Load data and define variables

t = 15;
nt = rows(invest);
n = nt/t;

//Assign 'y', 'Tq', 'c' and 'd'
//from the 1st, 2nd, 3rd and 4th
//columns of the matrix 'invest'
y = invest[.,1]; @ investment/assets @
Tq = invest[.,2]; @ Tobin's Q @
c = invest[.,3]; @ cash-flow/assets @
d = invest[.,4]; @ debt/assets @

You really do not want to use the load command as mentioned in the code above. If your data is either

A CSV or Excel file with headers.
GAUSS, SAS or STATA dataset.

you should use the loadd command. For example, if your file has 4 variables which are in the order specified by the code above, then

//Change this to your actual file name
fname = "myfile.xlsx";

//Load all variables from file
invest = loadd(fname);

Alternatively, if you have more variables in the file, or if they are in different order, then

//Change this to your actual file name
fname = "myfile.dta";

//Load variables by name
//Change the variable names to match what is in your file
invest = loadd(fname, "investment + Q + cash flow + debt");

The tutorials linked to here will explain more about file loading with formula strings. Let us know if you have more questions about the code.

link

aptech

1,773

Ogbarooye · Answer 7

Thank you. I will go through your explanation. I will let you know in case I need further help.

link

Ogbarooye

0

Ogbarooye · Answer 8