<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Aptech &#187; Answers for "Loops and multithreading"</title>
	<atom:link href="http://www.aptech.com/questions/loops-and-multithreading/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.aptech.com</link>
	<description></description>
	<lastBuildDate>Fri, 08 Feb 2013 19:12:28 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.4.1</generator>
		<item>
		<title>By: Aptech</title>
		<link>http://www.aptech.com/questions/loops-and-multithreading/#answer-2737</link>
		<comments>http://www.aptech.com/questions/loops-and-multithreading/#answer-2737#comments</comments>
		<pubDate>Fri, 07 Dec 2012 17:51:52 +0000</pubDate>
		<dc:creator>Aptech</dc:creator>
		
		<guid isPermaLink="false">http://www.aptech.com/questions/loops-and-multithreading/#answer-2737</guid>
		<description><![CDATA[The best practical advice would be: GAUSS level threads take some time to create and coordinate. On a recent Linux machine with an intel quadcore processor, this was timed at about 0.00009 seconds per thread create. Try and make sure &#8230; <a href="http://www.aptech.com/questions/loops-and-multithreading/#answer-2737">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>The best practical advice would be: GAUSS level threads take some time to create and coordinate. On a recent Linux machine with an intel quadcore processor, this was timed at about 0.00009 seconds per thread create. Try and make sure that any code that you execute in a separate GAUSS level thread will take at least 0.01 seconds to execute in order to achieve good thread efficiency.</p>
<p>For threading at a finer level than that, the internal GAUSS threads are already handling that.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.aptech.com/questions/loops-and-multithreading/#answer-2737/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>By: svannorden</title>
		<link>http://www.aptech.com/questions/loops-and-multithreading/#answer-2730</link>
		<comments>http://www.aptech.com/questions/loops-and-multithreading/#answer-2730#comments</comments>
		<pubDate>Fri, 07 Dec 2012 02:47:28 +0000</pubDate>
		<dc:creator>svannorden</dc:creator>
		
		<guid isPermaLink="false">http://www.aptech.com/questions/loops-and-multithreading/#answer-2730</guid>
		<description><![CDATA[That&#8217;s very interesting. Do you have any practical advice for users seeking to write simple for loops to take advantage of multiple processors? I&#8217;m sure problems of this type are frequently encountered by users.]]></description>
			<content:encoded><![CDATA[<p>That&#8217;s very interesting.</p>
<p>Do you have any practical advice for users seeking to write simple for loops to take advantage of multiple processors? I&#8217;m sure problems of this type are frequently encountered by users.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.aptech.com/questions/loops-and-multithreading/#answer-2730/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>By: Aptech</title>
		<link>http://www.aptech.com/questions/loops-and-multithreading/#answer-2717</link>
		<comments>http://www.aptech.com/questions/loops-and-multithreading/#answer-2717#comments</comments>
		<pubDate>Thu, 06 Dec 2012 23:10:45 +0000</pubDate>
		<dc:creator>Aptech</dc:creator>
		
		<guid isPermaLink="false">http://www.aptech.com/questions/loops-and-multithreading/#answer-2717</guid>
		<description><![CDATA[If you are accessing a normal matrix (i.e. not a string array, etc), you should be able to write to different elements of the same matrix. However, for performance you are generally best off to keep the data that is &#8230; <a href="http://www.aptech.com/questions/loops-and-multithreading/#answer-2717">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>If you are accessing a normal matrix (i.e. not a string array, etc), you should be able to write to different elements of the same matrix. However, for performance you are generally best off to keep the data that is written to by different threads some distance away from each other and to give each thread more work to do.</p>
<p>This is because each CPU on your computer has separate cache memory. Each CPU reads data into cache in chunks of data called &#8220;cache lines&#8221;. When one CPU writes to a cache line, it notifies the other CPU&#8217;s which cache line(s) it has written to. The other CPU&#8217;s will consider this cache line &#8220;dirty&#8221; which can require a reloading of the data. Loading data is (relative to other CPU operations) very, very slow. This can lead to a phenomenon called &#8220;cache thrashing&#8221; in which your threads spend much of their time reloading data written to by other threads and can make your code very slow.</p>
<p>User specified GAUSS threads are meant for &#8220;coarse parallelization&#8221;. GAUSS automatically carries out the finer level of multi-threading inside of the intrinsic functions.</p>
<p>Since GAUSS automatically threads many functions internally, code that does not use any explicit GAUSS threading statements will still take advantage of multiple cores. For example a matrix multiplication or linear solve may use 4-8 threads (or more) depending upon system resources and the size of the matrix. Therefore, you can use many cores with just a few GAUSS level threading statements.</p>
<p>In most cases you will be best off by creating a smaller number of blocks like this:</p>
<pre>
n=100; <span style="color:#006600">// number of times to loop</span>
nthreads = 2;
y1=zeros(n/nthreads,1); <span style="color:#006600">// holds results</span>
y2=zeros(n/nthreads,1); <span style="color:#006600">// holds results</span>

<span style="color:#000099">threadBegin</span>;
   <span style="color:#000099">for</span> j (1,n/nthreads,1);
      x = rndu(10,1); <span style="color:#006600">// some data to analyze</span>
      y1[j,1] = somefunction(x);
   <span style="color:#000099">endfor</span>;
<span style="color:#000099">threadEnd</span>;
<span style="color:#000099">threadBegin</span>;
   <span style="color:#000099">for</span> j (1,n/nthreads,1);
      x = rndu(10,1); <span style="color:#006600">// some data to analyze</span>
      y2[j,1] = somefunction(x);
   <span style="color:#000099">endfor</span>;
<span style="color:#000099">threadEnd</span>;

y = y1|y2;
</pre>
<p>This example shows just two blocks for the sake of explanation. But scaling to 4 would not be too hard. The copy and paste is, admittedly, not wonderful. But the code should avoid the memory issues discussed above and use many more than two threads (considering the automatic threading in GAUSS).</p>
]]></content:encoded>
			<wfw:commentRss>http://www.aptech.com/questions/loops-and-multithreading/#answer-2717/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

<!-- Performance optimized by W3 Total Cache. Learn more: http://www.w3-edge.com/wordpress-plugins/

Served from: www.aptech.com @ 2013-02-09 01:43:05 --