Aptech Systems, Inc. Worldwide Headquarters
Mailing Address:
PO Box 250
Black Diamond, WA 98010 USAStreet Address:
30741 Third Avenue #160
Black Diamond, WA 98010 USAPhone: 360.886.7100
FAX: 360.886.8922Ready to Get Started?
For Pricing and Distribution
Industry Solutions
Products
Resources
Support
Training & Events
Want more guidance while learning about the full functionality of GAUSS and its capabilities? Get in touch for in-person training or browse additional references below.
Tutorials
Step-by-step, informative lessons for those who want to dive into GAUSS and achieve their goals, fast.
Have a Specific Question?
Get a real answer from a real person
- Need Support?
Q&A: Register and Login
Support Plans
Premier Support and Platinum Premier Support are annually renewable membership programs that provide you with important benefits including technical support, product maintenance, and substantial cost-saving features for your GAUSS System or the GAUSS Engine.
User Forums
Join our community to see why our users are considered some of the most active and helpful in the industry!
Where to Buy
Available across the globe, you can have access to GAUSS no matter where you are.
Recent Tags
applications character vectors CMLMT covariance matrix dates dlibrary dllcall ECDF Editor error handling errors floating network GAUSS Engine Geometric mean graphics GUI hardware histogram hotkeys if statements installation Java API linux localization Matlab convert matlab translation matrices matrix initialization matrix manipulation Maxlik MaxLikMT Memory output pgraph graph PQG graphics RAM random numbers RedHat 6.1 simulation string functions strings threading threads loops Time Series writing dataRecent Questions
Features
Resources
Loops and multithreading
I’ve been reading about multithreading ability in GAUSS and would like to use it to speed up simple loops, but I’m not sure how to go about it. A simple example would be something like
n=100; // number of times to loop y=zeros(n,1); // holds results for j (1,n,1); x = rndu(10,1); // some data to analyze y[j,1] = somefunction(x); endfor;
Execution could be speeded up if the statement in the loop were to run as independent threads. This is illegal, however, as all threads try to write to the same global variable y.
I understand that I could repeat code blocks in the loop, such as
for j (1,n,1); // 1st thread x = rndu(10,1); // some data to analyze y1[j,1] = somefunction(x); // 2nd thread x = rndu(10,1); // more data to analyze y2[j,1] = somefunction(x); endfor; y = y1|y2;
3 Answers
If you are accessing a normal matrix (i.e. not a string array, etc), you should be able to write to different elements of the same matrix. However, for performance you are generally best off to keep the data that is written to by different threads some distance away from each other and to give each thread more work to do.
This is because each CPU on your computer has separate cache memory. Each CPU reads data into cache in chunks of data called “cache lines”. When one CPU writes to a cache line, it notifies the other CPU’s which cache line(s) it has written to. The other CPU’s will consider this cache line “dirty” which can require a reloading of the data. Loading data is (relative to other CPU operations) very, very slow. This can lead to a phenomenon called “cache thrashing” in which your threads spend much of their time reloading data written to by other threads and can make your code very slow.
User specified GAUSS threads are meant for “coarse parallelization”. GAUSS automatically carries out the finer level of multi-threading inside of the intrinsic functions.
Since GAUSS automatically threads many functions internally, code that does not use any explicit GAUSS threading statements will still take advantage of multiple cores. For example a matrix multiplication or linear solve may use 4-8 threads (or more) depending upon system resources and the size of the matrix. Therefore, you can use many cores with just a few GAUSS level threading statements.
In most cases you will be best off by creating a smaller number of blocks like this:
n=100; // number of times to loop nthreads = 2; y1=zeros(n/nthreads,1); // holds results y2=zeros(n/nthreads,1); // holds results threadBegin; for j (1,n/nthreads,1); x = rndu(10,1); // some data to analyze y1[j,1] = somefunction(x); endfor; threadEnd; threadBegin; for j (1,n/nthreads,1); x = rndu(10,1); // some data to analyze y2[j,1] = somefunction(x); endfor; threadEnd; y = y1|y2;
This example shows just two blocks for the sake of explanation. But scaling to 4 would not be too hard. The copy and paste is, admittedly, not wonderful. But the code should avoid the memory issues discussed above and use many more than two threads (considering the automatic threading in GAUSS).
That’s very interesting.
Do you have any practical advice for users seeking to write simple for loops to take advantage of multiple processors? I’m sure problems of this type are frequently encountered by users.
The best practical advice would be: GAUSS level threads take some time to create and coordinate. On a recent Linux machine with an intel quadcore processor, this was timed at about 0.00009 seconds per thread create. Try and make sure that any code that you execute in a separate GAUSS level thread will take at least 0.01 seconds to execute in order to achieve good thread efficiency.
For threading at a finer level than that, the internal GAUSS threads are already handling that.

