Re: Poor performance with OpenMP



On 01/14/2011 04:58 PM, nmm1@xxxxxxxxx wrote:
In article<4D307789.40608@xxxxxxxxxxxxxxx>,
Kay Diederichs<kay.diederichs@xxxxxxxxxxxxxxx> wrote:

Generally, 'hyperthreading' should be discounted when deciding how
many cores to use for parallel programs. It is rare to get better
performance by using two threads on the same core.

Hyperthreading has a bad reputation but recent CPUs are pretty good at
it. This is a table of timings I obtained when testing a recently bought
machine with 2 Xeon 5670. HT is enabled in the BIOS, so there are 12
"physical" cores and 12 "virtual" hyperthreading cores.

That doesn't negate my point! The original hypethreading was so bad
that it managed to knacker performance even if you were using only
half of the possible threads, but that's not what I am talking about.
You could get benefits from even that by choosing suitably trivial,
artificial programs, such as the one you used for a test. And, even
on that, you got only 20% improvement by doubling the number of
cores.

The point is that your program had essentially no communication
or memory access, almost all real OpenMP programs are limited by
one or the other, and they are the aspects that suffer worst from
hyperthreading. Tuning to minimise those in real programs is
murder.

I've seen slight beneficial effects of HT when using parallel versions
of large crystallographic programs (SHELXL, CNS) but I don't have the
numbers, and that was on older hardware.

So have I, and I have also seen VERY significant detrimental effects,
especially when using OpenMP, on that hardware. I haven't personally
investigated the modern systems in any depth but most of my sources
indicate that you can get anything from moderate benefit to moderate
detriment, and the unpredictability of the time is much larger (and
hence tuning is much harder).

In other words: HT often helps, and rarely hurts.

I am sorry, but what you have said does NOT justify that statement!
I accept that I have provided no evidence, but I assert that your
evidence is of dubious relevance to real programs.


Regards,
Nick Maclaren.

yeah we can go back and forth like that for ages. You say it's only 20% improvement; for me that may be half a day of wallclock time. Is the glass half full or half empty?

Maybe we can agree on the following?
a) depending on the specific situation (software/hardware), HT may or may not help. It is not helpful to insist that "it's always better" or "always bad".
b) one can always try a given code with and without HT, and decide afterwards. That's easy enough.

The same is true for OpenMP vs MPI for which you seem to have strong opinions (negative for OpenMP, IIUC). I find your apodictic words not helpful. Comparing these methods is a bit like comparing apples to pears. There are usage cases for both; it would be unrealistic to deny this. Anybody interested in parallelization should try both; one really has to learn from one's own experience.

My experience is that only in trivial cases a significant parallel speedup is easy to obtain. In real-world cases one has to become well versed for the specific problem and parallelization method, and invest time. This is true for both OpenMP and MPI - but the specifics differ (a lot).

best,

Kay
.