Re: Poor performance with OpenMP



On 01/14/2011 04:58 PM, nmm1@xxxxxxxxx wrote:
In article<4D307789.40608@xxxxxxxxxxxxxxx>,
Kay Diederichs<kay.diederichs@xxxxxxxxxxxxxxx> wrote:

Generally, 'hyperthreading' should be discounted when deciding how
many cores to use for parallel programs. It is rare to get better
performance by using two threads on the same core.

Hyperthreading has a bad reputation but recent CPUs are pretty good at
it. This is a table of timings I obtained when testing a recently bought
machine with 2 Xeon 5670. HT is enabled in the BIOS, so there are 12
"physical" cores and 12 "virtual" hyperthreading cores.

That doesn't negate my point! The original hypethreading was so bad
that it managed to knacker performance even if you were using only
half of the possible threads, but that's not what I am talking about.
You could get benefits from even that by choosing suitably trivial,
artificial programs, such as the one you used for a test. And, even
on that, you got only 20% improvement by doubling the number of
cores.

The point is that your program had essentially no communication
or memory access, almost all real OpenMP programs are limited by
one or the other, and they are the aspects that suffer worst from
hyperthreading. Tuning to minimise those in real programs is
murder.

I've seen slight beneficial effects of HT when using parallel versions
of large crystallographic programs (SHELXL, CNS) but I don't have the
numbers, and that was on older hardware.

So have I, and I have also seen VERY significant detrimental effects,
especially when using OpenMP, on that hardware. I haven't personally
investigated the modern systems in any depth but most of my sources
indicate that you can get anything from moderate benefit to moderate
detriment, and the unpredictability of the time is much larger (and
hence tuning is much harder).

In other words: HT often helps, and rarely hurts.

I am sorry, but what you have said does NOT justify that statement!
I accept that I have provided no evidence, but I assert that your
evidence is of dubious relevance to real programs.


Regards,
Nick Maclaren.

yeah we can go back and forth like that for ages. You say it's only 20% improvement; for me that may be half a day of wallclock time. Is the glass half full or half empty?

Maybe we can agree on the following?
a) depending on the specific situation (software/hardware), HT may or may not help. It is not helpful to insist that "it's always better" or "always bad".
b) one can always try a given code with and without HT, and decide afterwards. That's easy enough.

The same is true for OpenMP vs MPI for which you seem to have strong opinions (negative for OpenMP, IIUC). I find your apodictic words not helpful. Comparing these methods is a bit like comparing apples to pears. There are usage cases for both; it would be unrealistic to deny this. Anybody interested in parallelization should try both; one really has to learn from one's own experience.

My experience is that only in trivial cases a significant parallel speedup is easy to obtain. In real-world cases one has to become well versed for the specific problem and parallelization method, and invest time. This is true for both OpenMP and MPI - but the specifics differ (a lot).

best,

Kay
.



Relevant Pages

  • Re: Combining threads and MPI
    ... gcc 4.1.2, OpenMP and OpenMPI 1.2.6. ... In Open MPI, thread support is disabled by default. ... You can have a look at the utilized CPU time/load of your program while it is running using a suitable process monitoring tool. ... the OS is responsible for distributing the processes to the cores. ...
    (comp.parallel.mpi)
  • Re: Poor performance with OpenMP
    ... The same is true for OpenMP vs MPI for which you seem to have strong opinions ... parallelization should try both; one really has to learn from one's own experience. ... 2x speedup with 3 cores on a quad-core box, insignificant further improvement by using all 4 cores. ...
    (comp.lang.fortran)
  • Re: standalone C-file slower then mex-file?? / multiple cores slower then one
    ... testprogram was even slower when using both cores - and then it got ... openmp did lead to a performance gain but it was still slower ... object once to a standalone file and once to a mex-file leading to the ... int main ...
    (comp.soft-sys.matlab)
  • Re: Multithreading on Multicore Processors
    ... Is there something fundamentally different in the way threads are treated now and is there a easy way, like a compile or link option open to me so that I can utilize both processing cores efficiently? ... If your implicit ground rule is Windows only, free compilers only, your options ... The easy way to meet your goals is with OpenMP. ... Current commercial Fortran compilers incorporate OpenMP and built-in methods for pinning threads to processors. ...
    (comp.lang.fortran)
  • Re: Poor performance with OpenMP
    ... many cores to use for parallel programs. ... Hyperthreading has a bad reputation but recent CPUs are pretty good at ... I accept that I have provided no evidence, ... evidence is of dubious relevance to real programs. ...
    (comp.lang.fortran)