Re: Results of the memswap() smackdown from the thread "Sorting" assignment



On Tue, 12 Feb 2008 03:15:43 -0600, spinoza1111 wrote
(in article
<57710a40-730f-4349-9433-0918703b0239@xxxxxxxxxxxxxxxxxxxxxxxxxxxx>):

Two lines, and a small character string, were exchanged repeatedly,
first by "my" code, then by Ben's, then by Richard's, then by
Malcolms, an even number of times by each version in such a way that
there is no net change.

A few very short test lines doesn't really offer much chance for an
inline memswap() to be a win over versions making function calls, but
if you're happy with it... Odds are memcpy is actually highly
optimized instead (despite the little game below), so I sort of suspect
you used small strings for a reason. Perhaps the same as with the old
example we shall not name. :P

I used lccwin-32 on a Compaq laptop.

Everybunny's code worked correctly the first time.

Ben kicked *** in this benchmark because the third exchange was only
five characters long, demonstrating the wisdom of handling small
common cases fast.

Unless you happen to know that you're always going to be calling it
with 64KB chunks, in which case it isn't a benefit. Match the solution
to the data set when possible.

Richard's use of a "buffer" is clearly also a win for this benchmark
(but not as good as Ben's optimization) but I remain opposed to it for
the reasons I have given already.

That's pretty amazing to see, given that Ben's "version" is basically
the one you're complaining about with an optimization up front for
small swaps.

Did you really not notice?

It is in a contradictory fashion
relying on the quality of whatever library does memcpy while at the
same time it extends the library;

Hmm, apparently you really didn't. Look at them both again. How does
the one calling memcpy() that you do like not suffer as the one calling
memcpy() that you don't like? That's some interesting bias you have
showing there.

if memcpy starts to suck over the
lifetime of the program (which of course can include changes to the
library) this will suck,

Yeah, it's so incredibly common in practice for memcpy() to be
deoptimized by updates and nobody bothers to notice or fix it. No wait
a minute, it's not common at all.

meaning that the "library" considered as now
including the Heathfield memswap() will show perverse behavior over
time, adding to the cost of software maintenance. My theorem being
that for the same reason each library function should be a black box,
its performance should be either independent of or smoothly and
predictably related to other library functions, so that change of
platforms don't generate research and rework.

If you follow through on this, you wind up reinventing all of the
wheels in the libc, so as not to be "dependent" upon them for this goal
of yours. More chance for errors, less chance for taking advantage of
platform specific improvements without diving into unportable assembler
tricks, code reuse, etc. One of the key benefits of using standard
functions in many cases is that the implementation is highly optimized
for the target hardware, so that you don't have to roll, benchmark and
test a hand-optimized (or inline assembler) version for every platform
you care about. Of course it also simplifies your own code to not
rewrite versions for everything in the library to avoid using the
library. That's pointless.

Trading all of that for some mythical concern over the remote chance of
a future version of the library suddenly getting much worse and nobody
noticing it, seems completely misdirected, imo.

my saw or maxim would be "don't extend the C
library with your tools while expecting the library to remain the same
and to serve you".

Fortunately, it's not difficult to combine small functions in the
standard library to perform more complex actions. In fact, that's the
whole point.

C in my experience makes fools out of those who would create tools

It does seem to do a good job of making the fools obvious, I'll grant
you that one.

it overexposes things globally,

No, it doesn't.

you need to be far more expert in C details than I ever became to be
a C toolsmith.

I agree.

This is not based on personalities apart from Richard's modus operandi
in coding, which is, at times, and in my opinion, to be clever in a
fashion that for me is no longer useful

By your own admission, Ben's improvement on Richard's code was even
more clever, but for some reason you like it anyway, and even forgive
it for the very same traits that you find fault with in Richard's
original. Even a complete noob programmer could recognize the bias
there.

I'd only add that "many operating systems" freeze up and
piss me off when I am trying to get something done.

Get a better one. I'm extremely pleased with the one I'm using atm,
but I have seen some real bad ones. Most came from the same vendor in
the northwestern part of the US. Think that's coincidence, or perhaps
climate related? ;-)

Anyway, if I had to extend the C library, I wouldn't do a lot of
"reuse" solely for the purpose of speed as RH has done, at least not
as my first try.

You miss the point. I don't think he even mentioned speed originally.
That was your addition, iirc.

He could have written his own memcpy() implementation to avoid using
the libc function, but odds are in a cross-platform test, it would lose
out to many, if not all of them since they would be tuned for the
target architecture in ways often not possible with pure portable code
that doesn't leverage the standard library implementation.

This certainly shouldn't be news today. Most knew it ages ago.

Here are the CPU rankings

Ben's code took 63 clock() values to do the exchanges 10000 times.

spinoza1111's code took 203 clock() values to do 10000 exchanges

Richard's code took 109 values. to do 10000 exchanges

Malcom's code took 187 clock values. to do 10000 exchanges

A trend is emerging in these drag races you keep encouraging. Do you
see it?

// ***** Ben Bacarisse's code *****
#define BUF_LEN 64
void ben_memswap(void *vleft, void *vright, size_t n)
{
[snip]

Ê Ê Ê Ê Ê Ê Ê Êmemcpy(buf, left, BUF_LEN);
Ê Ê Ê Ê Ê Ê Ê Êmemcpy(left, right, BUF_LEN);
Ê Ê Ê Ê Ê Ê Ê Êmemcpy(right, buf, BUF_LEN);
[snip]
}


// ***** Richard Heathfield's code *****

#define BUF_LEN 64 /* adjust to taste */
void *RH_mem_swap(void *vleft, void *vright, size_t n)
{
[snip]
Ê Ê memcpy(buf, left, BUF_LEN);
Ê Ê memcpy(left, right, BUF_LEN);
Ê Ê memcpy(right, buf, BUF_LEN);
[snip]
}


Isn't it weird how the "good one" and the "bad one" both call memcpy(),
which you tell us is wrong? And they both do it the same way too. How
can this be?

#include <time.h>
#define ITERATIONS 100000

int main(int argc,char *argv[])
{
Ê Ê Ê Ê printArray();
Ê Ê Ê Ê int intIndex1;
Ê Ê Ê Ê printf("%d\n", clock());

What type does clock() return?

What type does %d imply for its argument?

All we really know portably is that it is an arithmetic type.

See also the other slew of such calls.

You might sample clock_t values before and after each, then use the
difference, divided by CLOCKS_PER_SEC (using double throughout the
calculation) to get an elapsed time value and display that with %f.
Then when somebody runs the same test on another architecture, (or a
system with another timer tick rate) the results can be reasonably
compared.

That, or use some other method to measure it, like Ben did.

--
Randy Howard (2reply remove FOOBAR)
"The power of accurate observation is called cynicism by those
who have not got it." - George Bernard Shaw





.