Mean of a multidimensional array along a specific dimension

I created a big multidimensional array, say 20000 x 5 x 3. I would like to calculate the mean of this array along the first dimension, i.e., to obtain a 5 x 3 matrix which is the average of the 20000 matrices. How is that possible?

10 Answers



1



accepted

OK. I think I see what the problem is here.

The first inefficiency, which is probably not significant in this case, but is worth pointing out, is that this line:

beta_post = areshape(zeros(k, nq), S|k|nq);
  1. Creates a k x nq matrix filled with zeros.
  2. Creates an S x k x nq array and copies the zeros over.

If you do this:

beta_post = arrayinit(S|k|nq, 0);

GAUSS will just initialize the array full of zeros.

The second issue, which is likely the entire cause of the difference, is that putarray is copying ALL the data from beta_post on EVERY assign.

When I run the following code:

new;
cls;

nmats = 1000;
k = 100;
nq = 12;
beta_post = arrayinit(nmats|k|nq, 0);

fill = rndn(k,nq);

// Array assignment
h = hsec();
for i(1, nmats, 1);
    beta_post[i,.,.] = fill;
endfor;
(hsec-h)./100 "seconds";

// putarray
h = hsec();
for i(1, nmats, 1);
    beta_post = putarray(beta_post, i, fill);
endfor;
(hsec-h)./100 "seconds";

// putarray with move keyword
h = hsec();
for i(1, nmats, 1);
    beta_post = putarray(move(beta_post), i, fill);
endfor;
(hsec-h)./100 "seconds";

The middle option, using putarray, takes more than 400 times longer than either of the other two.

What the move keyword does in third section is to tell GAUSS that it can change the contents of the current beta_post and does not need to keep the original beta_post values. Therefore, GAUSS just overwrites the ith section, instead of copying the entire array and then overwriting the ith section.

On my computer this is the fastest of the three options, but it could vary based on problem size and the specifics of your computer.

aptech

1,773


1



You can do this using the GAUSS asum function. Here is an example using a smaller array:

rndseed 234134;

a = areshape(rndn(2*3*4, 1), 2|3|4);
print a;

dims = getorders(a);
print dims;

mu = asum(a, 3) ./ dims[1];
print mu;

which will print out:

Plane [1,.,.]

     -0.20932176       -1.5665424      -0.44729797       0.75179440
     -0.94931332       0.31185670      -0.45456107       0.36310775
       1.2319571      -0.11479821      -0.39781858       -1.3745940

Plane [2,.,.]

     -0.36828790       0.61058848       0.59112951      -0.34671509
      -1.5763264       -1.2162433       0.44257331        2.0625205
     -0.64061235      0.016680418        1.4006794       0.13091056

       2.0000000
       3.0000000
       4.0000000

Plane [1,.,.]

     -0.28880483      -0.47797695      0.071915766       0.20253966
      -1.2628198      -0.45219330    -0.0059938834        1.2128141
      0.29567237     -0.049058897       0.50143041      -0.62184173 

aptech

1,773


1



If you want to grab the first row of each of the 2000, 5x3 matrices, you just use indexing like you would with a matrix.

rndseed 234134;

a = areshape(rndn(2*3*4, 1), 2|3|4);
print a;

a_1 = a[.,1,.];
print a_1;

returns

Plane [1,.,.]

     -0.20932176       -1.5665424      -0.44729797       0.75179440
     -0.94931332       0.31185670      -0.45456107       0.36310775
       1.2319571      -0.11479821      -0.39781858       -1.3745940

Plane [2,.,.]

     -0.36828790       0.61058848       0.59112951      -0.34671509
      -1.5763264       -1.2162433       0.44257331        2.0625205
     -0.64061235      0.016680418        1.4006794       0.13091056

Plane [1,.,.]
     -0.20932176       -1.5665424      -0.44729797       0.75179440

Plane [2,.,.]
     -0.36828790       0.61058848       0.59112951      -0.34671509 

aptech

1,773


0



Thanks a lot for the quick reply. Meanwhile I have also found the command amean that does what I am looking for.

However, I find a bit difficult to deal with arrays in Gauss. For example, if I wanted to take the first row of my 5 x 3 matrix for each of the 20000 repetitions and form a 20000 x 5 matrix, I don't know how to do it in a simple way. I tried the getarray command, but I was unable to do it.



0



Thanks again!

I decided to vectorise and use matrices instead of multidimensional arrays. It turned out that the code is considerably faster, while with arrays it was very slow. I don't know what the reason for this, but it's strange.



0



Depending on how you set up the code, there can be extra memory usage with multi-dimensional arrays.

If you can set it up to perform multiple computations in one call, it can be faster. For example:

nmats = 1e4;
X = areshape(rndn(nmats*100*12, 1), nmats|100|12);

X_2 = invpd(moment(X,1));

will be much faster than either of these:

for i(1, nmats, 1);
    X_i = getmatrix(X, i);
    X_2[i,.,.] = invpd(X_i'X_i);
endfor;

for i(1, nmats, 1);
    X_i = arraytomat(X[i,.,.]);
    X_2[i,.,.] = invpd(X_i'X_i);
endfor;

and getmatrix is more memory efficient than arraytomat if you are passing it to a function that operates on matrices.

aptech

1,773


0



Thanks again.

In your example it's clear that array calculations are faster. However, in my code there was a huge difference. It's a Gibbs sampler where I run a loop of 25000 iterations and I store a 2 x 5 matrix in each iteration (draws from the posterior distribution). If I define the array before the loop as a 25000 x 2 x 5 array with areshape and fill it with putarray, the code ran for well over an hour. But when I vectorised my matrix to a 10 x 1 vector and stored it in a 25000 x 10 matrix, it took about a minute to run the code. That's a huge difference and that's why I was wondering why this happens.



0



I would not expect that large of a difference. My first thought, is that something in putarray might be slow. The only difference is that you are assigning the result to an array with putarray vs assigning to a matrix?

I would be curious to see how long it takes if you assign it to the array using indexing.

aptech

1,773


0



Yes, the only difference is this. Here is the excerpt of the code that matters here (sorry for not having that nice code look as it should be, but I don't know how to do it in the editor. Before there was an option for it, but I can't find it anymore).

Initialise of the matrix/array

beta_post = zeros(S, k*nq);
beta_post = areshape(zeros(k, nq), S|k|nq);

Storing the posterior values:

beta_post[iter, .] = vecr(beta_postq)';
beta_post = putarray(beta_post, iter, beta_postq);

Everything else is exactly the same. And as I said, using arrays makes the code much much slower.



0



You are right! I tried and it's indeed the fastest option. I didn't even know about this move command, in fact, I can't even find it in the command reference.

Thanks a lot for your help, I've learned a lot!

Your Answer

10 Answers

1
accepted

OK. I think I see what the problem is here.

The first inefficiency, which is probably not significant in this case, but is worth pointing out, is that this line:

beta_post = areshape(zeros(k, nq), S|k|nq);
  1. Creates a k x nq matrix filled with zeros.
  2. Creates an S x k x nq array and copies the zeros over.

If you do this:

beta_post = arrayinit(S|k|nq, 0);

GAUSS will just initialize the array full of zeros.

The second issue, which is likely the entire cause of the difference, is that putarray is copying ALL the data from beta_post on EVERY assign.

When I run the following code:

new;
cls;

nmats = 1000;
k = 100;
nq = 12;
beta_post = arrayinit(nmats|k|nq, 0);

fill = rndn(k,nq);

// Array assignment
h = hsec();
for i(1, nmats, 1);
    beta_post[i,.,.] = fill;
endfor;
(hsec-h)./100 "seconds";

// putarray
h = hsec();
for i(1, nmats, 1);
    beta_post = putarray(beta_post, i, fill);
endfor;
(hsec-h)./100 "seconds";

// putarray with move keyword
h = hsec();
for i(1, nmats, 1);
    beta_post = putarray(move(beta_post), i, fill);
endfor;
(hsec-h)./100 "seconds";

The middle option, using putarray, takes more than 400 times longer than either of the other two.

What the move keyword does in third section is to tell GAUSS that it can change the contents of the current beta_post and does not need to keep the original beta_post values. Therefore, GAUSS just overwrites the ith section, instead of copying the entire array and then overwriting the ith section.

On my computer this is the fastest of the three options, but it could vary based on problem size and the specifics of your computer.

1

You can do this using the GAUSS asum function. Here is an example using a smaller array:

rndseed 234134;

a = areshape(rndn(2*3*4, 1), 2|3|4);
print a;

dims = getorders(a);
print dims;

mu = asum(a, 3) ./ dims[1];
print mu;

which will print out:

Plane [1,.,.]

     -0.20932176       -1.5665424      -0.44729797       0.75179440
     -0.94931332       0.31185670      -0.45456107       0.36310775
       1.2319571      -0.11479821      -0.39781858       -1.3745940

Plane [2,.,.]

     -0.36828790       0.61058848       0.59112951      -0.34671509
      -1.5763264       -1.2162433       0.44257331        2.0625205
     -0.64061235      0.016680418        1.4006794       0.13091056

       2.0000000
       3.0000000
       4.0000000

Plane [1,.,.]

     -0.28880483      -0.47797695      0.071915766       0.20253966
      -1.2628198      -0.45219330    -0.0059938834        1.2128141
      0.29567237     -0.049058897       0.50143041      -0.62184173 

1

If you want to grab the first row of each of the 2000, 5x3 matrices, you just use indexing like you would with a matrix.

rndseed 234134;

a = areshape(rndn(2*3*4, 1), 2|3|4);
print a;

a_1 = a[.,1,.];
print a_1;

returns

Plane [1,.,.]

     -0.20932176       -1.5665424      -0.44729797       0.75179440
     -0.94931332       0.31185670      -0.45456107       0.36310775
       1.2319571      -0.11479821      -0.39781858       -1.3745940

Plane [2,.,.]

     -0.36828790       0.61058848       0.59112951      -0.34671509
      -1.5763264       -1.2162433       0.44257331        2.0625205
     -0.64061235      0.016680418        1.4006794       0.13091056

Plane [1,.,.]
     -0.20932176       -1.5665424      -0.44729797       0.75179440

Plane [2,.,.]
     -0.36828790       0.61058848       0.59112951      -0.34671509 

0

Thanks a lot for the quick reply. Meanwhile I have also found the command amean that does what I am looking for.

However, I find a bit difficult to deal with arrays in Gauss. For example, if I wanted to take the first row of my 5 x 3 matrix for each of the 20000 repetitions and form a 20000 x 5 matrix, I don't know how to do it in a simple way. I tried the getarray command, but I was unable to do it.

0

Thanks again!

I decided to vectorise and use matrices instead of multidimensional arrays. It turned out that the code is considerably faster, while with arrays it was very slow. I don't know what the reason for this, but it's strange.

0

Depending on how you set up the code, there can be extra memory usage with multi-dimensional arrays.

If you can set it up to perform multiple computations in one call, it can be faster. For example:

nmats = 1e4;
X = areshape(rndn(nmats*100*12, 1), nmats|100|12);

X_2 = invpd(moment(X,1));

will be much faster than either of these:

for i(1, nmats, 1);
    X_i = getmatrix(X, i);
    X_2[i,.,.] = invpd(X_i'X_i);
endfor;

for i(1, nmats, 1);
    X_i = arraytomat(X[i,.,.]);
    X_2[i,.,.] = invpd(X_i'X_i);
endfor;

and getmatrix is more memory efficient than arraytomat if you are passing it to a function that operates on matrices.

0

Thanks again.

In your example it's clear that array calculations are faster. However, in my code there was a huge difference. It's a Gibbs sampler where I run a loop of 25000 iterations and I store a 2 x 5 matrix in each iteration (draws from the posterior distribution). If I define the array before the loop as a 25000 x 2 x 5 array with areshape and fill it with putarray, the code ran for well over an hour. But when I vectorised my matrix to a 10 x 1 vector and stored it in a 25000 x 10 matrix, it took about a minute to run the code. That's a huge difference and that's why I was wondering why this happens.

0

I would not expect that large of a difference. My first thought, is that something in putarray might be slow. The only difference is that you are assigning the result to an array with putarray vs assigning to a matrix?

I would be curious to see how long it takes if you assign it to the array using indexing.

0

Yes, the only difference is this. Here is the excerpt of the code that matters here (sorry for not having that nice code look as it should be, but I don't know how to do it in the editor. Before there was an option for it, but I can't find it anymore).

Initialise of the matrix/array

beta_post = zeros(S, k*nq);
beta_post = areshape(zeros(k, nq), S|k|nq);

Storing the posterior values:

beta_post[iter, .] = vecr(beta_postq)';
beta_post = putarray(beta_post, iter, beta_postq);

Everything else is exactly the same. And as I said, using arrays makes the code much much slower.

0

You are right! I tried and it's indeed the fastest option. I didn't even know about this move command, in fact, I can't even find it in the command reference.

Thanks a lot for your help, I've learned a lot!


You must login to post answers.

Have a Specific Question?

Get a real answer from a real person

Need Support?

Get help from our friendly experts.

Try GAUSS for 14 days for FREE

See what GAUSS can do for your data

© Aptech Systems, Inc. All rights reserved.

Privacy Policy