Re: my assembler is better than your assembler
- From: "Wolfgang Kern" <nowhere@xxxxxxxx>
- Date: Fri, 23 Mar 2007 16:55:00 +0100
rhyde wrote:
(so now I add)
Therefore we can be sure that "Randy is not the Author" of it.
What's your point? Certainly I've never claimed to have cornered the
market on the world's fastest algorithms. The difference between you
and I, however, is that I tend to collect the algorithms and implement
the best ones ("best" depending on the current metric) when needed. I
don't look at an existing algorithm (as you have done) and then turn
around and post a *stupid* implementation (as you have done).
First, this list was addressed to Frank.
My code collection usually starts with the "short but slow"
But you may wonder how fast it is in relation to your (granted faster)
solution when my routine runs within one prefetched cache line and
yours suffer from three fetch-penalties (+105 cycles on AMD K7).
My 'incredible slow' loop does maximal 31 iterations (2..3 cycles)
and it got an 'early out' in addition.
But you can be sure for whenever I need bit_count I find 'my' best.
[]Why should this be 'maintainable' by any means ?
Please anwer the question instead of generalising.
This code should be maintainable, like *all* code should be
maintainable, so someone else can come along and improve the terrible
algorithm you've chosen. Now you might argue that "this routine is so
short they can just rewrite it from scratch", but it's pretty clear
that this attitude of your's is pervasive and you write *all* your
code this way, so someone faced with modifying one of your programs
has a daunting task ahead of them.
I still don't have a clue 'what' can be maintained on this code piece.
Cycle count is always a matter to me, also if the prime target is
short code.
That much is obvious. And, once again, you've demonstrated why the
world at large has little regard for low-level assembly language
programmers such as yourself -- you're so busy worrying about the
cycles of the instructions you've used that you've missed the big
picture. And the big picture is that this loop, that executes 32
times, consumes a large number of cycles that could be cut in half, or
more, by using a better algorithm. Such as the one that AMD has been
publishing for years.
...(as you are) completely miss the big picture.I leave it up to you in your 10e78 particle universe to see
"big pictures" :)
Perhaps your borders are that close that you can see all of it.
Aren't you one of those people telling us all that we should read the
manufacturer's literature? Perhaps you should try that some time. Then
you'd know a better way to write a bit count routine.
I always recommend to read how a CPU is working, the code examples in
this books are just explaining hints and not final code solutions.
The teoretical fastest code often reveals itself as 'slower than' if
it's size result in penalties from fetching or more worst from paging.
Sorry, the bit count routine provided by AMD is *not* that much
larger. And even if you consider a *huge* variant, such as my integer-
to-ascii conversion code, it would be *very* rare that such a
degenerate case would occur. Were it to occur, it wouldn't happen just
because the itoa routine is large, but because the whole program
(working set) is large.
I didn't mean this method is my prefered one,Then why did you post it?
I anwered Frank.
it is just the first item on my list.
Obviously. You post the first thing that comes to mind. That's my
point. If you're trying to argue about writing fast code, you should
carefully consider what you post if you want to make some sort of
point.
'bit count' is rare used, so it's not in my sytems core anyway.
If it's so rarely used and the implementation doesn't matter, why,
then, do you care that my original implementation created a stack
frame? Even with that stack frame my code was considerably *faster*
than the implementation you've posted. If code is so rarely used, then
wouldn't you agree that having a stack frame, so you can easily and
safely access parameters and local variables, is a good thing?
Ask Frank why he took bit_count as an example,
I assume it was the first item in his folder.
But even here you had to change your mind and finally got rid of
the redundant stack frame.
My list ended with 'and so on' and '...'.
You can assume it is quite long.
No, I won't assume that.
If you had such a long list, I'm sure you would have picked a better
example to prove your point.
I have no 'points' with Frank,
he just supported me on my search for HLA created code.
A decent (ASM-syntax related) disassembler will show the Nops as an
align filler anyway.
Why?
First of all, not all processors have zero-time nops.
Which one? 286,386?
Second, someone may *want* to use something besides NOPs there.
Why? and why can't it be replaced by whatsoever three byte NOPcode?
Whatever, a decent assembler *won't* require you to count the opcode
bytes to determine how many NOPs you need.
I never had to to use my fingers to count it.
but even if I wouldn't know, my programming tools show me immediateYou *still* have to add up the bytes.
the codesize for an instruction right after I typed it in.
No. My tools also show the address(just effective low part if I want).
You *still* have to figure out how many NOPs you need.
I'm able to read '0,4,8,c' at the end of the address field.
I can figure this out w/o the help of a calculator.
[about ALIGN]
this syntax related align is just needed on tools which
doesn't show address and opcode field while typing.
__
wolfgang
.
- Follow-Ups:
- Re: my assembler is better than your assembler
- From: rhyde@xxxxxxxxxx
- Re: my assembler is better than your assembler
- References:
- Re: my assembler is better than your assembler
- From: Brian
- Re: my assembler is better than your assembler
- From: Wolfgang Kern
- Re: my assembler is better than your assembler
- From: Frank Kotler
- Re: my assembler is better than your assembler
- From: Wolfgang Kern
- Re: my assembler is better than your assembler
- From: rhyde@xxxxxxxxxx
- Re: my assembler is better than your assembler
- From: Wolfgang Kern
- Re: my assembler is better than your assembler
- From: sevag.krikorian@xxxxxxxxx
- Re: my assembler is better than your assembler
- From: Wolfgang Kern
- Re: my assembler is better than your assembler
- From: rhyde@xxxxxxxxxx
- Re: my assembler is better than your assembler
- From: Wolfgang Kern
- Re: my assembler is better than your assembler
- From: Frank Kotler
- Re: my assembler is better than your assembler
- From: Wolfgang Kern
- Re: my assembler is better than your assembler
- From: Frank Kotler
- Re: my assembler is better than your assembler
- From: Wolfgang Kern
- Re: my assembler is better than your assembler
- From: Frank Kotler
- Re: my assembler is better than your assembler
- From: Wolfgang Kern
- Re: my assembler is better than your assembler
- From: randyhyde@xxxxxxxxxxxxx
- Re: my assembler is better than your assembler
- From: Wolfgang Kern
- Re: my assembler is better than your assembler
- From: rhyde@xxxxxxxxxx
- Re: my assembler is better than your assembler
- From: Wolfgang Kern
- Re: my assembler is better than your assembler
- From: rhyde@xxxxxxxxxx
- Re: my assembler is better than your assembler
- Prev by Date: Re: Ten years later
- Next by Date: Re: cFASM (calling FASM as a C function)
- Previous by thread: Re: my assembler is better than your assembler
- Next by thread: Re: my assembler is better than your assembler
- Index(es):
Relevant Pages
|