Re: Benchmarking multiplications and additions




Eric Sosman wrote:
Michel Rouzic wrote On 03/20/06 13:06,:
Eric Sosman wrote:

Michel Rouzic wrote:


Remco Rijnders wrote:


Michel Rouzic wrote:



I need to determine how long does an addition take and how long does a
multiplication takes. So far I've been trying to use the clock()
function in my programs in order to find out how long it took the CPU
to compute it, the only problem is that I get fairly inconsistent
results, inconsistent enough not to know between two codes which code
runs faster without running each about ten times and making the average
of the reported CPU times.

If the result of doing this test once is too irregular to draw conclusions
from, why not do it a million times and time that instead? Make sure to
take random numbers so the compiler it won't optimise the computations out
by replacing it with a constant value.


Yeah but the problem with making the test a million times is that it's
pretty much what I'm trying to avoid. Well that's ok if it's about
multiplications and additions, but when it comes to testing a program
or peice of a program, it makes you have to make the thing run for
several minutes, as if you had a regular measurement of the computation
time, it would have to run for a veyr few seconds (well ok, this is
slightly beyond my original mutliplication vs. addition problem, but
it's still quite the same problem, it's all about measuring how
efficient something is compared to another)

Question: How does the "several minutes" of testing time
compare to the amount of time you've already spent reading
and writing messages in this thread? To put it another way,
how many test runs must you avoid in the future to recoup the
time you've already spent?


Two or three, alright, maybe four. I think convolving a 882 million
sample signal compares well with reading a message and replying to it.
Anyways, it's not like the ultimate time i'll ever meet this
performance measuring problem. I've met it before (but for programs
where performance was more about comfort than something critical), I'm
dealing with it now and I'll undoubtfully meet it again many times in
the future.

Then your worry wasn't about the "several minutes"
after all, was it?

Well yes it was, in a way, and in another way, its more about accuracy.
See, if I had a reliable way of measuring the performance of my stuff,
I would only need to perform short computation to get reliable and
consistant figures. But instead I get meaningless numbers when
performing a short computation, and get reliable numbers only when the
computation lasts in average about 25 minutes. You know, when you're
looking for which settings offer the best performance, you don't even
wanna wait 5 minutes between each tests.

<off-topic>

You're going to have a difficult time getting a useful
answer out of a micro-benchmark. CPUs these days are rather
more complex than they were in the time of Babbage, and
notions like "the time required for a multiplication" are
too simplistic to describe their behavior. For starters,
most CPUs built in the past decade or so can crunch the data
faster than memory can supply operands or consume results.
(Compare your current machine's CPU clock frequency to the
speed of the memory chips, and you'll see what I mean -- for
example, the reciprocal of 50 ns is 0.02 GHz.)

Yeah, I understand that, but the way my problem has evolved since I
posted it, at last it's more about reliably determining the performance
of an algorithm meant to be used as such depending on the settings than
determining something as abstract as multiplication vs. addition
without a clearly defined concept. The nature of the problem still
remains quite the same tho.

The speed disparity means that memory effects dominate
the performance of large-scale numerical calculations. The
question isn't how fast the CPU runs, but how to keep it
busy as opposed to twiddling its thumbs waiting for memory.
The performance of the various caches that stand between the
CPU and the memory will have more to do with overall throughput
than will the choice of what operation to perform on the data
once you've got hold of it. If you want to keep the operands
flowing in and the answers flowing out, you need to pay close
attention to organizing your data in cache-friendly ways --
all of which are system-specific and outside the scope of the
C language, of course. The only semi-portable hint I can
offer is that you might try to do as much work as possible
with each value once you've fetched it to the CPU; pushing an
intermediate result back to main memory and then re-fetching
it to complete the computation later is likely to slow you down.

Well, i don't have the feeling I can do much about organizing my data,
since i'm dealing with one-dimensional arrays (well idk how i could
organize that differently) and mostly that the most CPU hungry part of
the whole thing is based on a library.

This isn't to say that the question of multiplication
speed vs. addition speed is wholly without meaning, just that
in the context of your larger problem it'll be at best a
second-order consideration.

</off-topic>

Do you have an idea of where I should ask about how to measure the
performance reliably? (considered that it can be a Windows-speficic
answer)

.



Relevant Pages

  • Re: for linux, which CPUs are almost always faster than which other CPUs?
    ... Is there any way to find out how to compare two different CPUs for the ... For a *web server*, the main bottleneck is you disk drives / mass storage ... The *second* bottleneck will be memory. ... And then I'll ask someone about comparing CPU A and CPU B, ...
    (comp.os.linux.misc)
  • Next July 27: boot failure(hang) on x86_64 box.
    ... Freeing unused kernel memory: 1360k freed ... ACPI: PM-Timer IO Port: 0x488 ... CPU: L2 Cache: 1024K ... # AX.25 network device drivers ...
    (Linux-Kernel)
  • [PATCH] Document Linuxs memory barriers [try #3]
    ... The attached patch documents the Linux kernel's memory barriers. ... I've tried to get rid of the concept of memory accesses appearing on the bus; ... barring implicit enforcement by the CPU. ...
    (Linux-Kernel)
  • Oops in 2.6.28-rc9 and -rc8 -- mtrr issues / e1000e
    ... Bios 1.04beta did show correct memory sizing in dmidecode, ... I hope this is as simple as me doing something glaringly wrong in the kernel ... DMI present. ... CPU: L2 cache: 6144K ...
    (Linux-Kernel)
  • Re: read vs. mmap (or io vs. page faults)
    ... not fit in main memory, and there are overheads related to the heuristics ... But because the CPU is underutilized, ... reasonably sized user buffer). ... You have to measure the actual overhead to see what the actual cost is. ...
    (freebsd-questions)