GAUSS Threading Tutorial – Part 1

Overview and introduction

GAUSS, version 11 and later, employs parallel computation on a few different levels. The two main levels relevant to the GAUSS user are:

  • Internal threading of GAUSS intrinsic functions.
  • The GAUSS user defined threading functions.

This first installment of the GAUSS threading tutorial will discuss threading of GAUSS intrinsic functions. Subsequent sections will cover the basics of the GAUSS user defined threading procedures and then move on to advanced concepts and optimization.

Internally threaded functions

Many functions in GAUSS are internally threaded. These functions will choose the number of threads for maximum efficiency and performance based upon the function called, your available system resources and the size of your data. We will use the matrix multiply as our example function for this section. It is a good choice because it does not require many inputs or outputs and it also scales well to many CPU’s as the problem size grows. You can see this at work by first opening the Task Manager on Windows (press CTRL+ALT+DEL, then select Task Manager from the list) or by using the command line utility ‘top’ on Linux.

Once your chosen system profiling tool is open, run the following program in GAUSS:

r = 10;
x = rndn(r,r);
for(1,1e7,1);
   z = x'x;
endfor;

 

While this simple program is running, watch your CPU usage. You should see GAUSS using about 100% of one CPU. This means that GAUSS is not creating any additional threads. In this case GAUSS is not creating any additional threads, because it has determined that it will not speed up the computation. If this example program runs too quickly, you may not get an accurate reading from the system usage utility. If so, increase the number of iterations of the loop. Once it has been running for a few seconds and the CPU meter has stabilized, feel free to stop the GAUSS program with the stop button.

Now we will make two small changes to the program and run it again. Make ‘r’ equal to 100 and decrease the number of iterations in the for loop to 1e5. Your new program should look like this:

 

r = 100;
x = rndn(r,r);
for(1,1e5,1);
   z = x'x;
endfor;

 

This time you should observe GAUSS using approximately 100% of two CPU’s. If you have four or more CPU’s, try increasing the problem size to 1000×1000 by increasing ‘r’ to 1000. A problem of this size will use at least 4 CPU’s completely if the system can provide GAUSS with the necessary resources.

The matrix multiply is not the only internally threaded function in GAUSS. Many other linear algebra functions in GAUSS are internally threaded.

Summary:

In this brief section we have learned:

  1. Many linear algebra functions in GAUSS are internally threaded.
  2. GAUSS automatically selects the most efficient number of CPU’s or cores to use based upon the function, the system resources and the data size.
  3. No user input is required to take advantage of this level of threading.

Next:

In the next section, we will introduce the GAUSS functions that allow users to specify parallel sections of their GAUSS code so that parts of their program may be run concurrently in different threads.