Re: "Sorting" assignment



On Feb 11, 6:04 pm, Randy Howard <randyhow...@xxxxxxxxxxxxxxxxx>
wrote:
On Mon, 11 Feb 2008 01:41:06 -0600,spinoza1111wrote
(in article
<8aec508f-e711-412c-8d8e-a1460e148...@xxxxxxxxxxxxxxxxxxxxxxxxxxxx>):





On Feb 11, 3:41 am, Randy Howard <randyhow...@xxxxxxxxxxxxxxxxx>
wrote:
On Sun, 10 Feb 2008 12:13:06 -0600,spinoza1111wrote
(in article
<d74e2997-2abe-455a-bf6c-536935234...@xxxxxxxxxxxxxxxxxxxxxxxxxxxx>):

On Feb 11, 12:23 am, Ben Bacarisse <ben.use...@xxxxxxxxx> wrote:
Well, I wasn't discussing the applicability of specific instances,
because this applicability is probably statistically all over the
place depending on implementations, including implementations in which
in the code in which Richard's code snippet is inserted, which have a
redefined byte by byte memcpy because years ago, some clown couldn't
get the library procedure to "work".

Calling Richard's post a problem just because you might run into a QOI
issue on some ancient compiler doesn't make a lot of sense.  I think it
is quite likely that if part of a C implementation is badly done,

...or overridden by macros that redefine library functions, as I said.
This can be deeply buried inside #includes.

It can be, yes.  You could puke on any code example published anywhere
that makes what appears to be a function call on those grounds though,
so it seems rather weak.

I didn't say Richard claim it was a perfect solution for bad

            ^^^ typo, this should have been "see" instead of "say"

implemenations.  It was rather a solution in standard C.  All too often

It is a solution in some situations.

Can you think of any likely examples where it wouldn't work?

#define memcpy ... in some clown's program.


I've used "buffering".

Pretty much anyone that's ever used stdio has, even if they don't
realize it.  That's not a big deal.

But in general I prefer not to trade space for time

You should make up your mind, sometimes you flay solutions for being
too slow, other times for optimizing too much, some times for being too
tricky and not beautiful.  Pick a position and stick with it, it'll
make it a lot easier to follow your comments.

Take a closer look at my reasoning.


Richard made a false ontological claim in response to McLean's true
claim that C doesn't have a memswap. This claim was a code snippet
which may work LESS well than doing it byte by byte.

It might, in some system.  There are no performance guarantees, which
is why when someone asks if method X is faster than method Y, the
correct answer is almost always "it depends, better measure and find
out on the systems and compilers you use".

That's the wrong answer in light of computational complexity theory,
which provides us with efficiency measures that are logically
independent of systems and compilers.

This is the difference between theory and practice.  We know you like
practice, because you harp on it often.  I recall you ranting against
the use of big-O notation in the past, for you know what, because it
didn't reflect the real world and such.  But yes, in general,
complexity analysis makes for good, /course/ comparisons.  Those
invisible constant factors can be annoying unless n is quite large.  

;-)

On most systems, his method is likely to be better than a pure byte by
byte swap, and doesn't suffer from a high likelihood of generating a
bus error doing int-sized XOR swaps on unaligned addresses.  People

I think we better forget about XOR.

A good idea, but it is still useful in some places, just not nearly as
much as it once was.

often incorrectly assume it's okay to do this because of the "it works
on my computer" issue.  On many platforms, it's flat out broken, on

Not unless the makers are frauds. XOR has a fixed meaning.

This comment seems to imply that you still don't follow the hardware
issues that can arise due to alignment issues.

Far from it. As an assembler programmer I dealt with alignment every
day. You even had to worry about alignment in Cobol!


Yes, that is what happened, when you made an assertion that some
compiler has a really horrible memcpy that makes Richard's code wrong
or hopelessly slow.  We haven't been shown any platform where it
doesn't do well as yet, but would you like to try comparing your code
to his on a few commonly used platforms and compilers, and publish the
results?  

I will try it on my Windows system using lccwin-32.

That is a rather bizarre, not-quite-C compiler, but ok.  I'm sure
others have access to better ones on Windows.

...which is why we need a language to talk about computers and
programming far more purified of assumptions: because we cannot always
get what we want, but if we try sometimes, we get what we need.

Dijkstra was paradoxically the most practical computer scientist ever
because he insisted on purity of assumptions, having had to work in
the early days on completely bizarre platforms (his paper on the IBM
1620 is a classic).

This is a team effort. Perhaps you can run it on your toaster?

LOL.  When you post a complete instrumented test app, compilable
portably, with timing and test cases built in for both Richard's
(unmodified) version and yours, I'll be happy to do an apples to apples
on some other platforms and CPU architectures, no pun intended.

It's on the new thread. Get to work.

It is after all, a claim about performance underlying most of this.  If
you post the test fixture, others here with a number of different
implementations can perform the same measurements and we can all
compare notes on how the work "in practice".  Be sure and call the swap
routines with both aligned and unaligned data and measure the two
distinctly for the most informative results.

I've abandoned XOR, but will test a version that uses XOR because I'm
a nice guy.

Recommending XOR in a general solution in 2008 isn't being nice.  In

It has been withdrawn.

fact, it's been enshrined in more than one programming FAQ for a good
reason.  It can be downright dangerously broken in some cases, with
aliasing problems and macro side-effects in some languages.

Consider what happens when you have aliasing, and you call, effectively
but perhaps unexpectedly, xor(a, a);  boom.   You now get to spend time
debugging a hard to reproduce bug where your data vanishes
occasionally.  

Thanks for reminding me that my abstract solution has to handle
overlap!

Regardless, on modern architectures, although it may appear to look
efficient, it is quite often slower than a conventional swap through a
temp variable.  First of all, compilers typically recognize the temp
"idiom" and generate good code for the underlying hardware.  With or
without that, modern CPUs like to execute commands in parallel when
possible.  With XOR, the inputs of each stage depend on the results of
the one before, so they must be executed sequentially.

This is also true for a straight exchange, which is why my "lazy"
swap, which scans for different bytes, saves real time. Not as much as
Richard's hack and not as much as Ben's, but in a way completely
independent of specific versions of C. The only precondition is that
"Moving data must suck more than examining data" and this is probably
true apriori, as a logical tautology, if "moving" data implies
"examining" data.



The good news is, the compiler knows the rules for such things and you
don't have to, unless you run into a bottleneck that has to be removed,
then you worry about them.

A program deserves to be called real if you can safely port it keeping
its performance, not necessarily at some maximum, but predictable and
controllable. It appears here that C programs don't exist in this
sense.

In that sense, no language provides such portability of "performance".

But computational complexity theory does.

It provides a high-level guideline, most applicable to very large
values of "n".  It also doesn't provide any guarantees about the
efficiency of libc function implementations, or any direct knowledge of
CPU hardware feature differences that can be incredibly important to
real world performance on different architectures.  Throw in variations
due to physical memory capacity, VM design, compiler optimizations,
etc., and making predictions based upon O() analysis can be
embarrassingly wrong.

Job One, for the reasons you have given, should be developing a
language that is purified of preconditions. I know that the hardware
and software environment is fearfully dirty and complex. All the more
reason to find algorithms which operate independent of it. For
example, the genius who realized that when you evaluate a && b, or a
|| b on a computer, you should NOT evaluate b in the && when a is
false, or in the || when a is true, deserves a Nobel prize. I first
heard of this optimization before getting access to computers and in a
logic class.

In the 1950s, computer science could have disregarded the empirical
limits of then existing computers because of Moore's law, but under
IBM and American pressure, it did not, and we're living with the chaos
that has resulted, from the intelligence failures of Sep 11 to the
mortgage crisis, imo.


If it was totally satisfactory, benchmarks wouldn't need to exist, and
we wouldn't have people comparing algorithms with actual empirical
testing on ranges of platforms.  

But computational complexity theory gives an approximation.

Yes, it does.  That gives you a guideline when selecting which
algorithms to explore with actual measurement.

No argument here.

I wonder at times just how much "performance" matters

People frequently worry about it far too often, and too early.  For a
concrete example, look only to this thread, where you opined that
Richard's implementation wasn't any good for such reasons, offered up
your XOR version, which won't work on a number of in-use CPU
architectures today, and is likely to actually be slower on many of
them where it will.  All the better reason to only worry about
optimizing code when it has been demonstrated, as you like to say "in
practice" to be insufficiently fast in a real program.  Armchair
optimizing can be fun, to be sure, but shooting at some example code on
performance grounds when it might be more than fast enough where it is
going to be employed is a waste of energy.

At the most abstract level, as opposed to programming and hacking,
forcing oneself to search for the best solution is often the best way
to proceed. Dijkstra often did this, for example in smoothsort. It
helps morale and finds lemmas quickly.

I know the programmer wisdom that you should search for the stoopidest
solution, and in the real world, I have seen an entire company go down
the tubes in honor of this false wisdom. This company used a PhD's own
version of TSO to do market research and in it, the cry was "get it
done today, using the stoopidest solution you can find".




In the most general sense, Ben, I don't think using a "buffer" is a
good way to make code fast because it's a Faustian bargain. You're not
thinking up a new and more clever algorithm, such as my insanely
clever (d'oh) idea that we skip crap exchanges in which a==b, you're
being an engineer trying something out which may backfire at a later
date.

Using the ancient XOR hack isn't thinking up a new algorithm either.  

No. But skipping unnecessary exchanges IS.

No, it isn't new either.  This topic, especially with respect to
memswap() flavors has been beaten to death for years, with no clear
winner, because the best answer depends upon a lot of variables.  I
know of no general, portable solution that doesn't depend upon
platform-specific tricks that is uniformly best in all cases.  If
someone does, I'd love to hear of it.

The problem expressed most abstractly is "I have a device which can
only move a byte at a time. What is the provably fastest way to swap
memory?"

As of today (2-13) I have found that if you assume its more costly to
move bytes than to compare them you save time by lazy swap. That I
admit is a separate assumption but usually valid.

Given the way I have expressed the problem, it makes no sense to use a
"buffer" at all.

C is a device which will never be able to move more than a byte at a
time when we consider it at the bare metal. This ability was simply
not on Ritchie's agenda because the psychology of DEC people at the
time considered variable length operations as something for lower
forms of life, such as MIS programmers.

Which means that only specific instances of C will be able to
specifically exploit the many computers that provide variable length
operations.

Seriously, am I missing something here? Variable length operations at
the hardware level are contrary to the RISC philosophy, I know.

There may be some work on this question already, and I'll research
that. If we are condemned by night to move only the byte (or aligned
multiples of bytes such as int and long) then a clever way to do this
fast would be useful. I've already found, probably reinvented I admit,
a dumbass way to do it faster than brute force: assume that moving is
more costly than scanning.


Try it.  Get your method to compile and actually work properly, write a
"bonehead macro" that demonstrates you can outperform Richard's
version, then all you need to do is find any existing system where
memcpy() performance is similar and you still win.  Easy, right?  
Otherwise, you're just grasping at straws.  

Ask yourself, if someone you had never seen post before had written
Heathfield's code in a separate thread on swapping data, would you have
even gone down this path?  I rather doubt it

That is correct. Heathfield was, I believe, trying to start trouble by
implying that McLean was wrong in claiming that C doesn't have
memswap, and his answer was valid only in a rather weak sense.

He didn't ...

read more »- Hide quoted text -

- Show quoted text -

Thanks for an improved and more collegial tone. Now, go over to the
new thread and try the versions on your toaster and chain saw. I need
to take a sabbatical from this newsgroup but shall return in 48 hours.

I know you will wait for me with bated breath.
.



Relevant Pages

  • Re: Free FAT16 Filesystem
    ... being an honourable reason. ... Dave, I made every effort 3 weeks ago to have you understand that if you ... >>Murray did not mention Keil in his correspondence to you. ... >>both your compiler AND under Keil's, ...
    (comp.arch.embedded)
  • compiler and metadata, request opinions...
    ... a lot of the upper/middle compiler machinery is still lacking (such as ... embed the metadata directly into the object modules (the reason being that ... request for a particular piece of information is embedded in a symbol (sort ... it will be loaded into an in-memory version of the database. ...
    (comp.compilers)
  • misc: compiler and metadata...
    ... a lot of the upper/middle compiler machinery is still lacking (such as ... embed the metadata directly into the object modules (the reason being that ... request for a particular piece of information is embedded in a symbol (sort ... it will be loaded into an in-memory version of the database. ...
    (comp.lang.misc)
  • Re: "Sorting" assignment
    ... too slow, other times for optimizing too much, some times for being too ... That is a rather bizarre, not-quite-C compiler, but ok. ... The overlap issue is a different problem, but performing a "swap" on ... CPU hardware feature differences that can be incredibly important to ...
    (comp.programming)
  • Re: [RFC][PATCH] make /proc/pid/pagemap work with huge pages and return page size
    ... Right now we have normal memory, swap, and hugetlb pages. ... Is there any way to do unions of bitfields? ... struct of bitfields or vice-versa will probably cause the compiler not ... the x86 specific pmd_to_pptewon't be that x86 specific no ...
    (Linux-Kernel)