I have played with threadFor and found that it may not take up all the computer cores. On a 32-core machine, if I run a simulation (of 1 million times, say) using threadFor, it would typically take up to 20% of CPU. Admittedly, this is faster than the simple for loop. But if I manually split into 32 threads (via threadStart-threadEnd~thereadJoin), then it will take up 100% CPU and run much faster than threadFor.
I was expecting threadFor to take advantage of all the available cores. In contrast, Matlabs's parfor does take up all the cores if I specify the preferred number of workers in a parallel pool to the same as the number of physical cores.
The initial release of threadFor has an internal limit of 4-threads. Around the beginning of March, 2015 GAUSS version 15.1 will be released which will by default set the maximum number of threads to be the number of cores on the machine. It will also have new functions to control the number of threads created in cases where using a different number would be advantageous. This will be a free update for all GAUSS 15 license holders.