Vectorizing Statements

Optimizing your GAUSS program: Vectorizing statements

Difficulty Rating: Beginner to advanced

Applicability Rating: High

Tradeoffs: Low to none

GAUSS is designed to operate most efficiently on matrices and vectors. Learning to think in terms of vector and matrix operations is one of the most important GAUSS coding skills. It will not only make your code faster, but it will also make it shorter, with more of the details handled under the hood by GAUSS rather than in your program.

Example 1: χ2 Random Variables

We can create individual χ2 random numbers like this:

rChi = rndGamma(1, 1, 0.5.*df, 2);

where df represents the degrees of freedom. Part of a simulation program may include code like this:

for i(1, nobs, 1);
   chiVar = rndGamma(1, 1, 0.5.*df, 2);
   // more simulation code
endfor;

Alternatively, you can remove the loop and request rndGamma to return ‘nobs’ elements in one call, like this:

chiVars = rndGamma(nobs, 1, 0.5.*df, 2);

This code is simpler, shorter and runs approximately fifteen times faster. Notice that the ‘.*’ element-by-element multiplication operator works appropriately  in either case. In GAUSS, operators are overloaded to allow for scalar, vector or matrix operations. While this is a fairly trivial example, it illustrates the power of this technique.

Example 2: Drawing random samples

Let us suppose that we have a matrix A that contains one million observations of five variables. Let us further suppose that we would like to draw 500 observations from A for each sample, and we would like to draw a total of 2000 separate random samples with replacement.

Our first step is to create a vector of 2000 uniform random numbers with element values between 1 and one million. You can create a uniform random number between 1 and one million like this:

rint = ceil(1e6 * rndu(1, 1));

You can inefficiently create a 2000x1 vector of random integers with the range of [1, 1e6] like this:

//Create vector to hold random integers
rint = zeros(2000, 1);

//Fill ‘rint’ vector with random integers
for i(1, 2000, 1);
   rint[i] = ceil(1e6 * rndu(1,1));
endfor;

Now take a moment to apply the lesson from example 1 and vectorize this code snippet.  The answer is:

rint = ceil(1e6 * rndu(2000, 1));

Now we could create a vector of sample observations by extracting them one-by-one as below:

//Create vector to hold sample
samp = zeros(2000, 1);

//Sampling with replacement
for j(1, 2000 1);
   idx = rint[j];
   samp[j,.] = A[idx,.];
endfor;

In GAUSS, however, you can index into a matrix or vector with non-consecutive integers. For example, if you had a matrix X and wanted to create a matrix xSmall equal to the 9th row of X over the 5th row of X over the 12th row of X you could perform that action like this:

idx = { 9, 5, 12 };
xSmall = X[idx,.];

Since this functionality is available, we can vectorize our sampling with replacement by substituting the following for the for loop above:

samp = A[rint,.];

Below is a comparison of both options for drawing one sample of 2000 observations in complete form:

//Scalar version
rint = ceil(1e6 * rndu(2000, 1));

for j(1, 2000, 1);
   idx = rint[j];
   samp[j,.] = A[idx,.];
endfor;

//Vectorized version
rint = ceil(1e6 * rndu(2000, 1));
samp = A[rint,.];

As in our first example, the vectorized code is simpler and cleaner. When running the full example drawing 500 separate samples, including the vectorized creation of uniform random integers in both cases, the vectorized sampling with replacement is just over eight times faster on one machine.

Example 3: Laplacian Cumulative Distribution Function

The formula to calculate the cumulative distribution function for the Laplacian distribution is:

Assuming x and mu are scalars, the most straightforward way to code this is probably:

if x < mu;
   ans = 0.5*exp((x - mu) ./ b);
else;
  ans = 1 - 0.5*exp(-(x - mu) ./ b);
endif;

If x and mu are vectors or matrices, this operation could be done on each corresponding element of the variables in a for loop, but it would be much faster to vectorize it. The first step in vectorizing this code is to vectorize the logic statements.  This involves creating a mask variable that will control which elements in our vector (or matrix) will be processed by which branch of our algorithm. We will first create a mask for all of the elements in which x < mu, like this:

maskLow = x .< mu;

Notice the ‘.’ before the ‘<’ operator. The dot specifies that you want the operation to be performed on an element-by-element basis. This will give us a variable, maskLow, which will have a one if the corresponding element of x is greater than or equal to mu and a zero otherwise. For example if:

     4         3
x =  2    mu = 3
     1         2

Then the code:

maskLow = x .< mu;

will assign maskLow as follows:

          0
maskLow = 0
          1

We can create the mask for the other case like this:

maskHigh = x .>= mu;

Now we can use the masks to ‘cover’ the rows to which a particular code branch does not apply like this:

maskLow = x .< mu;
ansLow = maskLow .* (0.5 * exp((x - mu) ./ b));

This calculates the ‘low’ part of the algorithm on all elements in the vector. It then multiplies the elements to which this algorithm should not be applied by zero and the elements to which it should be applied by one. The next step is to apply this technique to the other branch:

maskHigh = x .>= mu;
ansHigh = maskHigh .* (1 - 0.5*exp(-(x - mu) ./ b));

Now the sum of these two vectors is our final answer:

ans = ansLow + ansHigh;

The full code looks like this:

maskLow = x .< mu;
ansLow = maskLow .* (0.5 .* exp(((x - mu)) ./ b));
maskHigh = x .>= mu;
ansHigh = maskHigh .* (1 - 0.5 .* exp(-(x - mu) ./ b));

ans = ansLow + ansHigh;

Not all would argue that this code is simpler than the code above. However, once the idea of masking sets in it will be quite clear. As a side note, any code that might not be as clear to all users should be commented to show the logic as the code progresses.

For this example, the speed-up obtained by vectorizing the equation rather than using a for loop is between four and five times. This is not as great an improvement as we obtained in the preceding examples, but it is still quite significant and is what we would expect since we are performing about twice as many mathematical operations in the vectorized version compared to the version with the if statements.

One final note, the Laplacian cumulative distribution function is calculated by the GAUSS-supplied function cdfLaplace. If we use that GAUSS function, we see performance that is about four times faster than the vectorized code above and we don’t have to write it ourselves! You should almost always use GAUSS-supplied functions when possible, they will usually be considerably faster and simpler to implement then to write.

Summary

  1. GAUSS is designed to work most efficiently on vectors and matrices. If you find yourself performing many operations on individual elements in a vector or matrix, take a step back and consider whether this case could be described in terms of vector or matrix operations.
  2. Vectorized code is usually faster, cleaner, and shorter.
  3. Logical comparisons can be vectorized with the ‘dot’ operators.
  4. Use vector indexing to extract specific non-consecutive elements from a matrix or vector.
  5. Use masking in place of if statements when possible.
  6. Search for a GAUSS-supplied function before writing one yourself.

Go to: Optimizing your GAUSS program: Loop Optimizations.