Loop Optimizations

Introduction

Many GAUSS programs spend large portions of their runtime inside of loops. As shown in the Vectorizing Statements section of this tutorial, many loops can be replaced with vector operations. However, even after applying optimum vector techniques, looping will still be an integral part of many GAUSS programs.

Principle 1: Use Definite Loops not Conditional Loops

Difficulty Rating: Beginner
Applicability Rating: High
Tradeoffs: Low

Conditional loops are loops that continue to iterate until a given statement evaluates as true or false. For example, the loop below will continue iterating until i is greater than 10.

i = 1;
do while i ≤ 10;
   //loop body
   i = i + 1;
endo;

For novices, do while and do until statements sometimes feel more comfortable. However, the definite loop equivalent has quite a few advantages; one of which is speed.

Definite loops run for a specified number of iterations. You can still control these loops with break, continue, etc., but they usually run for a specific number of iterations. The syntax looks like this:

for i(start, stop, step);
   //loop body
endfor;

The inputs start, stop and step all refer to the loop counter, which in this case is the variable i. start is the value of the counter on the first iteration of the loop, step is the value by which the loop counter is increased on each loop iteration and when the value of the loop counter is equal to the value of stop, the loop will end. We can create a for loop that is equivalent to the do while loop above like this:

for i(1, 10, 1);
   //loop body
endfor;

The loop counter is available for use in the body of the loop just like it is in the do while loop above. However, in the definite loop, the loop counter will not exist outside of the body of the loop.

If you take a look at the loops above, you will notice that the for loop is more compact and keeps all of the loop control together in one place. This means you will type less and also not have to search through a possibly long loop body to locate the line that is incrementing your loop counter.

How much time will for loops save my program? In GAUSS, it is rare for the actual loop implementation to take a significant share of your total run-time. So the speed increase is likely to be modest. However, there are many other upsides to using for loops and in almost all contexts no downside. As such, they should be used by default. One quick final note, if you are worried that you will forget if the for loop inputs are start, stop, step or start, step, stop, have no fear. As you are entering your for loop, the GAUSS editor will give you a tool-tip reminding you.

Principle 2: Avoid matrix concatenation in loops

Difficulty Rating: Beginner to intermediate
Applicability Rating: High
Tradeoffs: Low

Avoiding matrix concatenation in loops is one of the most powerful optimization tools for your GAUSS programs. It is also simple to implement in most cases. Let’s start with a simple piece of code that

Creates a random vector.
Calculates its mean.
Saves this mean as the next element in an output vector.

//Create an empty matrix
mean_out = {};

for i(1, 5, 1);
   var1 = rndn(100, 1);

   //Calculate the mean of var1
   //and concatenate it to the
   //bottom of mean_out
   mean_out = mean_out | meanc(var1);
endfor;

After the code above, mean_out should be a 5x1 vector with values similar to:

            0.071465209
            0.086140725
mean_out = -0.079360720
            0.076487572
           -0.034172203

To achieve the same result without concatenation, you need to pre-allocate mean_out to its final size. Then in the loop, you set the individual rows to the new value rather than adding another row on the bottom. Using our same example from above:

//Pre-allocate to final size
siz = 5;
mean_out = zeros(siz ,1);

for i(1, siz, 1);
    var1 = rndn(100, 1);

    //Calculate the mean of var1
    //and assign its value to
    //the ith row of mean_out
    mean_out[i] = meanc(var1);
endfor;

Looking at these two code snippets, we can see that the code change is quite small. Both are quite readable as well. However, the code with the pre-allocated output vector can run 5-20 or more times faster depending upon the size of the vectors and the available memory on your machine.

Since this is such an important technique we will work through another similar but, slightly more complicated example. This time we will calculate the minimum and maximum value of each vector as well as its mean.

//Create an empty matrix
out = {};

for i(1, 5, 1);
   var1 = rndn(100, 1);

   //horizontally concatenate min,max and mean
   //of var1 into one row
   tmp = minc(var1) ~ maxc(var1) ~ meanc(var1);

   //vertically concatenate tmp onto the
   //bottom of 'out'
   out = out | tmp;
endfor;

After running the code from above, the variable out should be a 5x3 matrix with values similar to:

      -1.837 2.615 0.039
      -2.932 2.132 0.057
out = -2.769 2.428 0.003
      -2.811 2.415 -0.186
      -2.169 3.292 -0.027

This code has two uses of concatenation within the loop. However, the concatenating of three scalars to create the variable tmp is much less significant than the concatenation of the vector out. The general rule is that concatenation will be relatively slow if you are making a small dimensional change many times on a large matrix or vector.

We have not defined ‘small’ and ‘large’ here because it will be related to the specifics of your hardware. However, creating a 1x3 matrix every time through the loop will not be very significant on any computer.

Creating a vector that is 1000x1 and then 10001x1 and then 10002x1…is much more significant. The reason is because the operating system will be forced to locate a new chunk of RAM large enough to hold the entire size of the re-sized matrix every time that it is changed. This problem is not specific to GAUSS but is related to current hardware and low-level software architecture.

If that explanation seemed opaque, do not worry. You do not need to understand the underlying issues to successfully avoid the problem.

Take a moment and think about how you would modify this example to remove the concatenation from the assignment of the variable out. When you are ready, take a look at the answer below.

//Pre-allocate 'out'
iter = 5;
out = ones(iter,3);

for i(1, iter, 1);
   var1 = rndn(100, 1);

   //horizontally concatenate min,max and mean into one row
   tmp = minc(var1)~maxc(var1)~meanc(var1);

   //index to assign the rows of out
   out[i,.] = tmp;
endfor;

The changes to this code are almost identical to those in the first example. The most significant change is that since out now has more than one column, we need to specify which columns to assign as well as the rows. Here is a quick refresher on indexing variables in GAUSS:

To assign the fourth row and seventh column of a variable X to be equal to 3.14, we would enter:

X[4,7] = 3.14;

If X has three rows and we want to assign all of them, we could use the dot operator '.' to indicate that we want to assign to all the columns, like this:

A = { 1 2 3 };
X[1, .] = A;

Getting back to our code example, you may notice that removing the vertical concatenation of out has simplified that line considerably. It is no longer necessary to use the tmp variable as an intermediate. We can simplify to two lines:

tmp = minc(var1) ~ maxc(var1) ~ meanc(var1);
out[i,.] = tmp;

into one line:

out[i,.] = minc(var1)~maxc(var1)~meanc(var1);

This leaves us with a final version that looks like this:

//Pre-allocate 'out'
iter = 5;
out = ones(iter,3);

for i(1, iter, 1);
   var1 = rndn(100, 1);

   //horizontally concatenate min,max and mean
   //into one row and assign to ‘out’
   out[i,.] = minc(var1)~maxc(var1)~meanc(var1);
endfor;

Testing this example with 1e5 iterations shows the version that does not use concatenation to create out is approximately 20 times faster than the original version. This tutorial will not detail the removal of the horizontal concatenation of the output from the functions minc, maxc and meanc. However, it is recommended that you try it for additional practice.

Principle 3: Avoid unnecessary work in loops

Difficulty Rating: Beginner to intermediate
Applicability Rating: Medium
Tradeoffs: Low to high

Let’s start this section with a very simple example to illustrate the principle.

for i(1, 1e6, 1);
   out[i] = A[i] * sqrt(x);
endfor;

In this example, we can see that the sqrt(x) is the same on every iteration of this loop. Since it will always be the same, we can make the calculation outside the loop and assign it to a variable that is used inside the loops.

sqrt_x = sqrt(x);

for i(1, 1e6, 1);
   out[i] = A[i] * sqrt_x;
endfor;

In the code above, we only need to execute the square root calculation one time instead of the one million times it would be executed in the original code. In this simple, artificial example on one machine, it saves around 20% of the total computing time.

When to apply this technique

Deciding whether to pull a redundant calculation to the outside of a loop depends on a few factors.

Will the resulting code be less clear?
How many times will the line of code be called?
Is the calculation time significant compared to surrounding code?
Will the code be reused often?
Will the code be subject to frequent modification?

As you are creating new code, you will need to balance these factors and make a judgment. Obviously, it is generally preferred to create code that is clear and easy to understand in all cases. However, this is particularly important if the code is likely to be modified often. On the other hand, for code that will be used often, an investment in making the code faster is more likely to pay off—even if it seems to be fast enough for its current usage. When in doubt, always choose the clearest and easy to understand implementation.

Summary

Replace conditional loops (do until, do while) with definite loops (for) when possible.
Pre-allocate matrices to with the ones or zeros function before a loop instead of gradually increasing the size of a matrix through the use of concatenation as often as possible.
When the trade-off seems profitable, take redundant calculations from loops.