Re: "Sorting" assignment
- From: Randy Howard <randyhoward@xxxxxxxxxxxxxxxxx>
- Date: Wed, 13 Feb 2008 10:25:26 GMT
On Wed, 13 Feb 2008 01:14:46 -0600, spinoza1111 wrote
(in article
<9c6e2a26-6d2d-482e-b1ea-a0779ae8f96d@xxxxxxxxxxxxxxxxxxxxxxxxxxx>):
I didn't say Richard claim it was a perfect solution for bad
^^^ typo, this should have been "see" instead of "say"
implemenations. It was rather a solution in standard C. All too often
It is a solution in some situations.
Can you think of any likely examples where it wouldn't work?
#define memcpy ... in some clown's program.
Why are we concerned with clown programs? Why not focus on something
actually likely to occur?
I've used "buffering".
Pretty much anyone that's ever used stdio has, even if they don't
realize it. That's not a big deal.
But in general I prefer not to trade space for time
You should make up your mind, sometimes you flay solutions for being
too slow, other times for optimizing too much, some times for being too
tricky and not beautiful. Pick a position and stick with it, it'll
make it a lot easier to follow your comments.
Take a closer look at my reasoning.
I keep trying to do that, but it is elusive.
On most systems, his method is likely to be better than a pure byte by
byte swap, and doesn't suffer from a high likelihood of generating a
bus error doing int-sized XOR swaps on unaligned addresses. People
I think we better forget about XOR.
A good idea, but it is still useful in some places, just not nearly as
much as it once was.
often incorrectly assume it's okay to do this because of the "it works
on my computer" issue. On many platforms, it's flat out broken, on
Not unless the makers are frauds. XOR has a fixed meaning.
This comment seems to imply that you still don't follow the hardware
issues that can arise due to alignment issues.
Far from it. As an assembler programmer I dealt with alignment every
day. You even had to worry about alignment in Cobol!
Then why are you not able to follow them in C?
I will try it on my Windows system using lccwin-32.
That is a rather bizarre, not-quite-C compiler, but ok. I'm sure
others have access to better ones on Windows.
...which is why we need a language to talk about computers and
programming far more purified of assumptions: because we cannot always
get what we want, but if we try sometimes, we get what we need.
Do you have anything better than song lyrics to support you?
This is a team effort. Perhaps you can run it on your toaster?
LOL. When you post a complete instrumented test app, compilable
portably, with timing and test cases built in for both Richard's
(unmodified) version and yours, I'll be happy to do an apples to apples
on some other platforms and CPU architectures, no pun intended.
It's on the new thread. Get to work.
Does the code you posted verify the results?
fact, it's been enshrined in more than one programming FAQ for a good
reason. It can be downright dangerously broken in some cases, with
aliasing problems and macro side-effects in some languages.
Consider what happens when you have aliasing, and you call, effectively
but perhaps unexpectedly, xor(a, a); boom. You now get to spend time
debugging a hard to reproduce bug where your data vanishes
occasionally.
Thanks for reminding me that my abstract solution has to handle
overlap!
The issue above isn't overlap, it is a mathematical property of XOR.
The overlap issue is a different problem, but performing a "swap" on
overlapping blocks is a bit bizarre at first glance. Maybe there's a
practical application for it? *shrug*
Regardless, on modern architectures, although it may appear to look
efficient, it is quite often slower than a conventional swap through a
temp variable. First of all, compilers typically recognize the temp
"idiom" and generate good code for the underlying hardware. With or
without that, modern CPUs like to execute commands in parallel when
possible. With XOR, the inputs of each stage depend on the results of
the one before, so they must be executed sequentially.
This is also true for a straight exchange, which is why my "lazy"
swap, which scans for different bytes, saves real time.
Have you tried it with the optimizer disabled? My money is on the
compiler recognizing a common construct and dropping in an optimized
replacement.
It provides a high-level guideline, most applicable to very large
values of "n". It also doesn't provide any guarantees about the
efficiency of libc function implementations, or any direct knowledge of
CPU hardware feature differences that can be incredibly important to
real world performance on different architectures. Throw in variations
due to physical memory capacity, VM design, compiler optimizations,
etc., and making predictions based upon O() analysis can be
embarrassingly wrong.
Job One, for the reasons you have given, should be developing a
language that is purified of preconditions. I know that the hardware
and software environment is fearfully dirty and complex. All the more
reason to find algorithms which operate independent of it.
Algorithms are independent of it. That's why they are often described
in pseudo-code, as they aren't even dependent upon a language, they are
merely a description of logical operations.
example, the genius who realized that when you evaluate a && b, or a
false, or in the || when a is true, deserves a Nobel prize.b on a computer, you should NOT evaluate b in the && when a is
A prize for that? It's pretty obvious.
I first heard of this optimization before getting access to computers
and in a logic class.
Indeed. I wouldn't be surprised if some philosopher figured that one
out 5000 years ago.
In the 1950s, computer science could have disregarded the empirical
limits of then existing computers because of Moore's law,
That would have been an interesting trick, since Moore didn't even come
up with it until 1965. ;-)
but under
IBM and American pressure, it did not, and we're living with the chaos
that has resulted, from the intelligence failures of Sep 11 to the
mortgage crisis, imo.
Oh, you're off the rails again... skipping ahead....
I wonder at times just how much "performance" matters
People frequently worry about it far too often, and too early. For a
concrete example, look only to this thread, where you opined that
Richard's implementation wasn't any good for such reasons, offered up
your XOR version, which won't work on a number of in-use CPU
architectures today, and is likely to actually be slower on many of
them where it will. All the better reason to only worry about
optimizing code when it has been demonstrated, as you like to say "in
practice" to be insufficiently fast in a real program. Armchair
optimizing can be fun, to be sure, but shooting at some example code on
performance grounds when it might be more than fast enough where it is
going to be employed is a waste of energy.
At the most abstract level, as opposed to programming and hacking,
forcing oneself to search for the best solution is often the best way
to proceed.
And it is quite often /not/ the best way to proceed. Junior
programmers are famous for always wanting to have these drag races
where they try to eek out 3 clock ticks over their lab partner and then
boast about it. Later on, they learn that removing a few clock ticks
from a routine this is only called once during application start up,
and never again is completely wasted energy. Even worse, spending time
optimizing a routing that is called frequently, but still isn't a
bottleneck to overall calculation throughput (perhaps because of I/O
latency elsewhere) is also wasted energy. But, it's fun, so people do
it anyway, and you wind up with a bunch of mangled code in unexpected
places in an attempt to make it faster, and the result is often
unreadable, and in some cases even unreliable.
The better approach, arrived at over time and due to experience is to
get a /correct/ solution first. Verify it gets the answers right.
Then, if it isn't fast enough, you profile it and find out where the
hotspots are, and you focus on those. If you have a huge performance
issue, then you might diverge into looking for a better algorithm, but
as you grow experience over time, the odds of you having picked a
suitable basic algorithm for the first pass improves. In either case,
you focus on the parts of the program that are bottlenecks, and you
eliminate those. What you don't do is spend hours, days, or weeks hand
optimizing every single function. That way lies madness.
I know the programmer wisdom that you should search for the stoopidest
solution,
Perhaps you do, but most of us don't, and we're thankful for it.
Using the ancient XOR hack isn't thinking up a new algorithm either.
No. But skipping unnecessary exchanges IS.
No, it isn't new either. This topic, especially with respect to
memswap() flavors has been beaten to death for years, with no clear
winner, because the best answer depends upon a lot of variables. I
know of no general, portable solution that doesn't depend upon
platform-specific tricks that is uniformly best in all cases. If
someone does, I'd love to hear of it.
The problem expressed most abstractly is "I have a device which can
only move a byte at a time. What is the provably fastest way to swap
memory?"
Most of us do not have a device that can only move a byte at a time.
Look at a data book on a modern desktop class CPU sometime. It might
be enlightening. If you want to talk about 8-bit microcontrollers
maybe, but that's light years from the start of this whole collection
of threads.
As of today (2-13) I have found that if you assume its more costly to
move bytes than to compare them you save time by lazy swap. That I
admit is a separate assumption but usually valid.
How do you arrive at "usually valid"? Have you tested it /anywhere/
other than a single CPU architecture with a single compiler? Have you
tested it at all even? And when I say "test", I don't mean you picked
a few cute phrases and ran a few thousand iterations of each and called
it done either.
Given the way I have expressed the problem, it makes no sense to use a
"buffer" at all.
You didn't express the problem though, someone else did, and Richard
responded. He responded with a working solution. You butted in and
accused him of a bunch of things, refusing to see that what he had done
was simply provide a working, compilable solution to the original post.
Then you posted several different attempts at a solution, some of
which didn't work at all, and perhaps all of them don't work properly,
I'm still waiting for you to confirm or deny that your latest
incarnation has been tested for accuracy, not just speed.
In the end, even if your version does give the correct results as is,
it runs slower. So when you dove in claiming his solution was broken
and you had a better way, you were in error, weren't you?
I'm sure you think that some point has been made, sort of like when you
were proud that your "beautiful code" was "ONLY 5 times slower!!", and
managed this feat while getting the answers wrong.
Not a lot of people take pride in their work anymore. You have to
acknowledge the rarity of someone not only being proud that their
solution is 5X slower, but that they managed to not get the right
results in the bargain. You don't see that every day.
C is a device which will never be able to move more than a byte at a
time when we consider it at the bare metal.
You are completely and utterly wrong about that.
Seriously, am I missing something here?
Yes.
There may be some work on this question already, and I'll research
that. If we are condemned by night to move only the byte (or aligned
multiples of bytes such as int and long) then a clever way to do this
fast would be useful. I've already found, probably reinvented I admit,
a dumbass way to do it faster than brute force: assume that moving is
more costly than scanning.
Save yourself some time. Crack open memcpy.s on an open source libc
(like gnu's) for x86. Observe the multibyte ops. You may need to look
up "rep" if you're not familiar with Intel assembly. Ta Da!
Or, try this instead:
int x = 0x7D15;
int y = 0;
y = x;
x = 0;
There, multi-byte move. :)
--
Randy Howard (2reply remove FOOBAR)
"The power of accurate observation is called cynicism by those
who have not got it." - George Bernard Shaw
.
- Follow-Ups:
- Re: "Sorting" assignment
- From: Clive D. W. Feather
- Re: "Sorting" assignment
- From: Willem
- Re: "Sorting" assignment
- References:
- "Sorting" assignment
- From: Ivica
- Re: "Sorting" assignment
- From: Bartc
- Re: "Sorting" assignment
- From: Malcolm McLean
- Re: "Sorting" assignment
- From: spinoza1111
- Re: "Sorting" assignment
- From: Malcolm McLean
- Re: "Sorting" assignment
- From: Richard Heathfield
- Re: "Sorting" assignment
- From: spinoza1111
- Re: "Sorting" assignment
- From: Willem
- Re: "Sorting" assignment
- From: spinoza1111
- Re: "Sorting" assignment
- From: Willem
- Re: "Sorting" assignment
- From: spinoza1111
- Re: "Sorting" assignment
- From: Ben Bacarisse
- Re: "Sorting" assignment
- From: spinoza1111
- Re: "Sorting" assignment
- From: Randy Howard
- Re: "Sorting" assignment
- From: spinoza1111
- Re: "Sorting" assignment
- From: Randy Howard
- Re: "Sorting" assignment
- From: spinoza1111
- "Sorting" assignment
- Prev by Date: Re: India
- Next by Date: Re: Results of the memswap() smackdown from the thread "Sorting" assignment
- Previous by thread: Re: "Sorting" assignment
- Next by thread: Re: "Sorting" assignment
- Index(es):