GAUSS Threading Tutorial – Part 2

User controlled threading in GAUSS

The user controlled threading tools in GAUSS are designed to provide great power and control to the user, while at the same time being easy to use. In this section we will examine the GAUSS threading functions and the rules for their usage.

GAUSS threading functions

  • threadstat – marks a single line of code to be executed as a thread.
  • threadbegin and threadend – mark the beginning and end of a multi-line block of code to be executed as a thread
  • threadjoin – completes the definition of a set of threads to be executed simultaneously (called “sibling threads”), waiting until they are all done before continuing on in the calling thread (or “parent thread”) Since users may create threads within threads, the calling thread may or may not be the main GAUSS execution thread. Every group of threadstat and/or threadbegin/threadend statements must be followed by a call to threadjoin.

We will begin by illustrating the most basic usage of these threading commands. Next we will explain some important concepts required to correctly use GAUSS threads in nontrivial cases and then illustrate with a simple Monte Carlo experiment. Try running the following code:

 r = 5;
 x = ones(r,r);
 threadstat z = x + x;
 threadstat z2 = x + x;
 threadjoin;

What are GAUSS threads?

User-defined GAUSS threads are sections of GAUSS code that can be run simultaneously. Each thread can access and use any previously created global symbols. In the example above, both of the threads created by the threadstat statements can access the global matrix ‘x’. If desired, they could also reference the scalar variable ‘r’. GAUSS threads can create new global symbols or change the value of global symbols that already exist. Either enter the show command at your GAUSS command prompt or examine the Data Page in the GAUSS GUI to see that this example program created two new global variables, z and z2. These variables may be referenced by code that comes after the threadjoin statement in your program or used interactively after the program ends.

Data Integrity

You can see in the example above that we are writing to different variables inside the two separate threads. This is very important. Never write to the same global variable inside any threads that could run simultaneously. You cannot have two threads writing to the same variable at one time. Further, if one thread writes to a global variable, no sibling threads (threads executing simultaneously) may read or write to that variable. This is called the writer-must-isolate rule. The reason for this rule is that we cannot know precisely when any given portion of code within a thread will be run. Therefore, if thread 1 reads from a variable that thread 2 is writing to, we cannot know whether thread 2 has written to the variable or is in the process of writing to it when thread 1 is trying to read it.

To illustrate one method to deal with this restriction let us return to our threading example above. Assume that the purpose of this program is to calculate x+x+x+x. Since we cannot write to our variable z twice, we are making two partial sums. z contains the first partial sum and z2 contains the second partial sum. After the threadjoin, we can add up the two partial sums:

 r = 5;
 x = ones(r,r);
 threadstat z = x + x;
 threadstat z2 = x + x;
 threadjoin;
 z = z + z2;

Obviously this is not the simplest method to calculate x+x+x+x. However, the concepts remain the same for more complicated examples. Also, remember that you could use this same process to compute a partial product or any type of intermediate value.

Example: Threaded Monte Carlo experiment

// NOTE: This is code is meant to illustrate the use of threading in GAUSS.
// It is intended to be easy to understand by a broad base of users,
// not to provide an optimal algorithm for actual modeling.
format /rd 4,0;  //Set data to print 4 characters wide with no characters after the decimal point
numflips = 1000; //number of coin flips per simulation
numex = 1500;    //number of simulations to run
maxheads = 0;

for i(1,numex,1); //Run 1500 simulations
   heads = 0;
   x = rndu(numflips,1); //Create 1000 random uniform numbers
   for j(1,numflips,1);
      if x[j] < 0.5; //If an element is less than 0.5, call it heads
         heads = heads+1;
      endif;
   endfor;
   if heads > maxheads;
      maxheads = heads;
   endif;
endfor;

print "Over " numex "simulations, of " numflips " coin flips each";
print "the greatest number of heads in one simulation was " maxheads;

Take a moment to look over the single-threaded version of this program and think about how you might parallelize it (that is, run parts of it simultaneously in different threads). Also consider the potential problems in threading this program. There are two primary issues to resolve. First, we have to decide how to break this problem up to be run in parallel. Second, we need to decide how to avoid writing to the same variables in different threads at the same time.

In response to the first issue, we will create two threads, each to run half of the total number of iterations. To deal with the second issue, we will create a GAUSS user-defined procedure that will run our simulation. This allows us to use only one version of each variable because they will be local variables not global variables. It also encapsulates some of our code and allows for code reuse. Our new program file with the use of a procedure will look like this:

format /rd 4,0;
numflips = 1000; // Number of coin flips per simulation
numex = 1500;    // Number of simulations to run

threadbegin;
   mheads1 = coinflips(numflips,numex/2);
   // Running half of the total
   // number of simulations
   // in each thread
threadend;
threadbegin;
   mheads2 = coinflips(numflips,numex/2);
threadend;
threadjoin;

if mheads1 > mheads2;
   maxheads = mheads1;
else;
   maxheads = mheads2;
endif;

print "After " numex "simulations, the most heads per thousand tosses was " maxheads;

proc coinflips(numflips,numex);
   local heads,maxheads,x;
  // Note that heads, maxheads, and x are local to the procedure coinflips, // so each call to coinflips will use separate copies of these variables. // This means that the procedure can be called from multiple sibling // threads without breaking the writer-must-isolate rule.

  maxheads = 0;
  for i(1,numex,1);
     heads = 0;
     x = rndu(numflips,1);
     for j(1,numflips,1);
        if x[j] < 0.5;
           heads = heads+1;
        endif;
      endfor;

      if heads > maxheads;
         maxheads = heads;
      endif;
   endfor;

   retp(maxheads);
endp;

After encapsulating the majority of the code inside of the new procedure, we are left with only two (temporary) global variables that we are writing to inside our two threads (mheads1 and mheads2). Run this sample program a few times. After reading the next section, come back to this program and use it as a starting point to explore the GAUSS threading functions further. Two sample problems you might try are:

  1. Modify the procedure and program so that it returns the minimum number of heads from a simulation.
  2. Modify the program to create four threads.

Summary:

  1. User-defined threads in GAUSS are portions of code that may be executed simultaneously.
  2. These threads can use any previously created global symbols. However, the writer-must-isolate rule says that if a thread writes to a global variable, no other sibling thread may access that variable.
  3. The same procedure may be called from multiple sibling threads. Since the variables in a procedure are local in scope and not global , you do not have to worry about writing to variables with the same names inside a procedure.

FAQ:

Q: Can I create GAUSS threads inside of other GAUSS threads?

A: Yes, you can create threads that call threads. GAUSS will allow you to nest threads as deeply as you would like. However, keep your system resources in mind.

Q: Can I create GAUSS threads inside of procedures?

A: Yes, you can create GAUSS threads inside of procedures. However, the threads created in your procedure must obey the writer-must-isolate rule for local variables in the procedure. For example, the following procedure is illegal:

proc myFunction(x,y);
   local z;
   z = 10;
   threadstat z = x'x;
   threadstat y = y'y+z; /* accessing variable z, which is written to
                           by a sibling thread, is illegal */
   threadjoin;
retp(y);
endp;

Q: Can I create GAUSS threads inside of for or do loops?

A: Yes, you can create GAUSS threads inside of for loops and inside of do loops.

Q: How many threads can I create in GAUSS?

A: GAUSS allows you to create an unlimited number of threads. You are limited only by your available hardware.

Q: How many threads should I create?

A: This is system and problem dependent. A good starting guideline is to create between 1 and 2 threads per core on your system. In some cases, however, you may be able to profitably create many more threads than that. See the next part in this tutorial series for more details on how to use threading most effectively.

Click to start the next section of this tutorial: GAUSS Threading Performance Considerations.