When I run my program, the error is:

d:\gauss\ex(20) : error G0030 : Insufficient memory Currently active call: nonpara [20] d:\gauss\ex Stack trace: nonpara called from d:\gauss\ex, line 10

I guess the reason is that the sample size is too large. My GAUSS version is GAUSS11. My code is as follows (the code file name is ex):

format /rd 10,4; clear n,y,x; load data[5515478,11]= d:\gauss\data.txt; y=data[2:5515478,4]; x=data[2:5515478,9]; n=rows(x); yx=nonpara(y,x,x); "Nonparametric Estimation at 1-20 sample points"; yx[1:20]; end; proc nonpara(y,x,x0); //nonparametric estimation:y,x,x0:nx1 local h,kx,ykx,r; h=1.06*stdc(x)*(n^(-1/5)); kx=(x-x0′)./h; kx=pdfn(kx); r=minc(rows(kx)|cols(kx)); kx=diagrv(kx,zeros(r,1)); ykx=y.*kx; r=minc(rows(ykx)|cols(ykx)); ykx=diagrv(ykx,zeros(r,1)); r=meanc(ykx)./meanc(kx); retp(r); //nx1 endp;

## 5 Answers

On the third line in your procedure:

kx=(x-x0′)./h;

The variable `kx` will end up with a size of 5,515,478 rows and 5,515,478 columns for a total size of about 221 terabytes. That is too large for any computer these days.

Some GAUSS procedures, such as `ols`, have the option of passing in a dataset and telling GAUSS to process it in chunks for this reason. However, since this is user generated code, you will need to do it.

You will need to process the statements that create the largest matrices a few rows of at a time. Here is a start

proc nonpara(y,x,x0); //nonparametric estimation:y,x,x0:nx1 local h,kx,ykx,r, i, inc, index_end; //h is not too big, at just rows(x) by 1, so leave as is h=1.06*stdc(x)*(n^(-1/5)); //number of rows to process at a time //change this based upon computer memory size inc = 10 for i(1, rows(x), inc); index_end = i + inc - 1; kx=(x[i:index_end]-x0′)./h; kx=pdfn(kx); //rest of code to process 'inc' rows of data endfor; retp(r); //nx1 endp;

If this is not clear, or you need help with part of the conversion, please post your questions.

Thanks a lot. Now I program the rest of "proc nonpara(y,x,x0)" if I don't use the leave-one-out estimation and x0 is not a multi-dimensional vector. Is it rightly coded in this case?

proc nonpara(y,x,x0); //nonparametric estimation:y,x:nx1,x0:1x1 local h,kx,ykx,r1,r2, i, inc, index_end; //h is not too big, at just rows(x) by 1, so leave as is h=1.06*stdc(x)*(n^(-1/5)); //number of rows to process at a time //change this based upon computer memory size inc = 10; for i(1, rows(x), inc); index_end = i + inc - 1; kx=(x[i:index_end]-x0')./h; kx=pdfn(kx); ykx=y[i:index_end].*kx; r1=sumc(ykx); r2=sumc(kx); endfor; retp(r1/r2); //nx1 endp;

However, I still have some questions for this program. Please give instructions.

First, in the original code, x0 is also a vector, for example, x itself, which also has a large dimension. So in "kx=(x[i:index_end]-x0′)./h;", the memory will still be insufficient.

Second, I need a leave-one-out estimation for nonparametric estimation at each sample point xi. So in the previous code I use "kx=diagrv(kx,zeros(r,1));" to make the elements in the diagnal of the large metrix kx all being zero. Now in this revised version how to achieve this leave-one-out estimation?

Thanks.

Your code has a couple of problems:

- You need to add zeros to the diagonal of the of
`kx`and`ykx`. But placing zeros on the diagonal of each chunk of rows we process is not the same as placing zeros on the diagonal of the full size`kx`matrix. - You forgot to take the mean of
`r1`and`r2`

Take a look at this procedure and I will look answer your new questions a little later.

proc nonpara(y,x,x0); //nonparametric estimation:y,x:nx1,x0:1×1 local h,kx,ykx,r1,r2, i, inc, index_end; //h is not too big, at just rows(x) by 1, //so leave as is h=1.06*stdc(x)*(rows(x)^(-1/5)); r1 = zeros(rows(x), 1); r2 = zeros(rows(x), 1); //number of rows to process at a time //change this based upon computer memory size inc = 5; for i(1, rows(x), inc); index_end = i + inc - 1; kx=(x[i:index_end]-x0')./h; kx=pdfn(kx); //We don't want 0's on the diagonal //of each chunk we process for j(1, inc-1, 1); kx[j,i+j-1] = 0; endfor; ykx=y[i:index_end].*kx; r1 = r1 + sumc(ykx); r2 = r2 + sumc(kx); endfor; //Calculate mean in return line retp(((r1)./rows(x))./((r2)./rows(x))); //nx1 endp;

Thanks for this careful instruction. Now suppose I take the first 5515 data for the nonparametric estimation. In the procedure x0 is 1 by 1 for simplicity. I program this code according to your suggestion as follows. When i let it run, the error is:

error G0058 : Index out of range

Currently active call: nonpara [38] d:\gauss\ex

Stack trace:

nonpara called from d:\gauss\ex, line 15

That is, there is something wrong in" kx=(x[i:index_end]-x0)./h;" Why? Please help.

-------------------------------

clear n,y,x; load data[5515478,11]=d:\gauss\data.txt; y=data[2:5515,4]; x=data[2:5515,9]; n=rows(x); a=minc(x); b=maxc(x); s=10; h0=(b-a)/s; x0=seqa(a,h0,s); //ten points in [a,b] yx=x0; for i(1, rows(x0), 1); yx[i]=nonpara(y,x,x0[i]); endfor; s~yx; end; proc nonpara(y,x,x0); //y,x:nx1, x0:1x1 local h,kx,ykx,r1,r2, i,j, inc, index_end; // h is not too big, at just rows(x) by 1, so leave as is h=1.06*stdc(x)*(rows(x)^(-1/5)); r1 = 0; r2 = 0; // number of rows to process at a time //change this based upon computer memory size inc = 1000; for i(1,rows(x),inc); if i>5*inc+1; //5001 to rows(x)=5514 kx=(x[(5*inc+1):rows(x)]-x0)./h; kx=pdfn(kx); ykx=y[5*inc+1:rows(x)].*kx; else; index_end = i + inc - 1; kx=(x[i:index_end]-x0)./h; kx=pdfn(kx); ykx=y[i:index_end].*kx; endif; r1 = r1 + sumc(ykx); //1x1 r2 = r2 + sumc(kx); //1x1 endfor; retp(r1/r2); endp;

A couple of things here:

- Don't make the code work for rows that are not evenly divisible by 'inc' until you get it working correctly for rows that are evenly divisible by 'inc' first. So pull out the entire block that starts with:
if i > 5*inc+1;

Adding this part last will make it much simpler to get it working.

- Your code needs to add zeros on the diagonal like the original code does.
- My last posted revision of the procedure assumed that all three inputs to the procedure would have the same number of rows. Some changes need to be made to accomodate that. For example, the
`for`loop for adding zeros on the diagonal will have to change since`kx`will not necessarily have the same number of columns as`x`has rows.

Here is a version that should work for any input where `y` and `x` have equal column length and the variable `inc` is a factor. Try this out, think about the changes you see and then try adding a clean-up loop to take care of the left over rows after we loop through `i`*`inc` times. I recommend starting with small data and using the debugger and then bumping up the problem size after it seems to work with the smaller data.

proc nonpara(y,x,x0); //nonparametric estimation:y,x:nx1,x0:1×1 local h,kx,ykx,r1,r2, i, j, inc, index_end, index_zeros, num_zeros; //h is not too big, at just rows(x) by 1, //so leave as is h=1.06*stdc(x)*(rows(x)^(-1/5)); r1 = 0; r2 = 0); //number of rows to process at a time //change this based upon computer memory size inc = 5; index_zeros = 1; if rows(x0) < inc; num_zeros = inc; else; num_zeros = rows(x0); endif; for i(1, rows(x), inc); index_end = i + inc - 1; kx=(x[i:index_end]-x0')./h; kx=pdfn(kx); //We don't want 0's on the diagonal //of each chunk we process if (index_zeros < num_zeros); for j(1, inc, 1); kx[j, index_zeros+j-1] = 0; endfor; index_zeros = index_zeros + inc; endif; ykx=y[i:index_end].*kx; r1 = r1 + sumc(ykx); r2 = r2 + sumc(kx); endfor; //Calculate mean in return line retp(((r1)./rows(x))./((r2)./rows(x))); //nx1 endp;