Re: Multithreading / Scalability
- From: blmblm@xxxxxxxxxxxxx
- Date: 6 Feb 2006 09:25:00 GMT
In article <44nc8qF3230tU2@xxxxxxxxxxxxxx>, <blmblm@xxxxxxxxxxxxx> wrote:
In article <43e678ab$0$337$9b4e6d93@xxxxxxxxxxxxxxxxxxxxxxxxxx>,
Philipp Kayser <p.kayser@xxxxxxxx> wrote:
Hi again,
I think I found the problem. I changed the loop again to:
for (n = 0; n < 10000; n++)
if (n % number_of_threads == thread_number)
{
double a = 2;
for (int p = 0; p < n; p++)
{
a += Math.sin(a);
}
result[thread_number] += a;
}
Now I get satisfying resuls:
2 CPUs / 1 Thread : 10.7s
2 CPUs / 2 Threads : 5.469s
I think the problem is the read/write-access to the result-array. If the
result-array is changed on one CPU, the cache on the second CPU gets
invalid. By pulling out the result-assignment of the inner loop, the
amount of cache invalidations are reduced greatly.
"False sharing"! wish I'd noticed this post before composing my reply
of a few minutes ago.
Following up a little, after doing some experiments on a four-processor
machine at work ....:
My idea for getting rid of the "false sharing" cache problem was
to do the original calculation summing into a local variable,
and then add into result[thread_number] at the end. That should
produce similar results to what you're doing above ....
And it does help -- performance changes from "the more threads, the
slower the program" to reasonable speedups with 2 and 4 threads,
compared to 1.
But ....
But the comment (in the other post) about improving the loop might
still be worthwhile.
By accident I discovered that making this change to the original
calculation, *instead of* making the switch to summing into a local
variable *actually produces a faster program*, with good speedups
for 2 and 4 threads.
I don't understand this at all. Maybe I've made some stupid
blunder. Otherwise -- hm, I don't know!
Below is my modified version of your code. I did make some other
changes -- simplified to have the main thread use "join" to wait
for the calculation threads (did you know you could do that?) and
to move all thread activity into the timed part of the code so
I don't have to do the slightly complicated stuff to wait until all
threads are started before starting the timed part ...
public class Test
{
private static int number_of_threads;
private Thread threads[] = new Thread[number_of_threads];
double result[] = new double[number_of_threads];
private class CalculationThread implements Runnable
{
int thread_number;
CalculationThread(int n)
{
thread_number = n;
}
public void run()
{
/*
double local_result = 0.0;
for (int n = 0; n < 600000000; n++)
if (n % number_of_threads == thread_number)
local_result += Math.sqrt(n);
result[thread_number] += local_result;
*/
// this actually is faster! ????
for (int n = thread_number; n < 600000000; n+=number_of_threads)
result[thread_number] += Math.sqrt(n);
}
}
private void multithreaded_calculation()
{
for (int i = 0; i < number_of_threads; i++)
{
threads[i] = new Thread(new CalculationThread(i));
//threads[i].setPriority(Thread.NORM_PRIORITY);
//threads[i].setDaemon(true);
threads[i].start();
}
try
{
for (int i = 0; i < number_of_threads; i++)
threads[i].join();
}
catch (InterruptedException e)
{
}
double total_result = 0;
for (int i = 0; i < number_of_threads; i++)
total_result += result[i];
System.out.println(total_result);
}
private void test()
{
long t0 = System.currentTimeMillis();
multithreaded_calculation();
long t1 = System.currentTimeMillis();
System.out.println(((double)t1 - t0)/1000);
}
public static void main(String[] args) {
number_of_threads = Integer.parseInt(args[0]);
new Test().test();
}
}
--
| B. L. Massingill
| ObDisclaimer: I don't speak for my employers; they return the favor.
.
- Follow-Ups:
- Re: Multithreading / Scalability
- From: Philipp Kayser
- Re: Multithreading / Scalability
- References:
- Multithreading / Scalability
- From: Philipp Kayser
- Re: Multithreading / Scalability
- From: Philipp Kayser
- Re: Multithreading / Scalability
- From: Philipp Kayser
- Re: Multithreading / Scalability
- From: blmblm
- Multithreading / Scalability
- Prev by Date: Re: Loading a jpeg really fast
- Next by Date: Re: copying a graphics object
- Previous by thread: Re: Multithreading / Scalability
- Next by thread: Re: Multithreading / Scalability
- Index(es):
Relevant Pages
|