Re: from elsewhere, an assembler
- From: Frank Kotler <fbkotler@xxxxxxxxxxx>
- Date: Tue, 10 Apr 2007 21:44:37 GMT
Wolfgang Kern wrote:
Frank wrote:
...
[convert nibble to hex-ascii]
cmp al,10
jc +2
add al,7
add al,48
cmp al, 10
sbb al, 69h
das
Shorter, and eliminates the conditional jump... but "das" is so slow
(how slow *is* it?), I don't think it's a "win"...
DAS latency is reported as 8 cycles in AMD-docs
Okay. I thought it was even worse than that. Maybe on Intel, it is...
Intel docs describe what it does:
IF ((AL AND 0FH) > 9) OR (AF = 1)
THEN
AL <- AL - 6;
AH <- AH - 1;
AF <- 1;
CF <- 1;
ELSE
CF <- 0;
AF <- 0;
FI;
AL <- AL AND 0FH;
Apparently this is "aas". Strong evidence that these instructions are rarely used!
You see it may alter AH as well, which may spoil the game.
Maybe. Seems likely that ah is "don't care". But that isn't the case with "das" anyway... I don't think... I really haven't got these instructions figured out. My understanding is that the "d" forms work on "pBCD", and the "a" forms on "uBCD" (I think that's how you call them). I don't know if any of 'em work on actual "ascii characters"... This might suggest a scheme for "better names", I dunno...
DAS is an invalid instruction in 64-bit mode.
Perhaps a(nother) reason to avoid it.
What would be your idea of a "fast" way to do it?
IIRC we've seen many variants in the fastest shortes discussion
some time ago in CLAX.
Yeah... Stupid, open-ended question, I guess...
My 8 byte solution (3.5 cycels) wins in the aspect of using
no other registers nor memory. The cc-branch will produce a
penalty if used in a loop (every 9th iteration IIRC).
9th, eh? Okay...
The short five byte way (10 cycles) and uses AH.
Or maybe not... But still a mess of cycles.
Unfortunately CMOV doesn't have an IMM nor any 8-bit form, so
mov edx,3007h
mov ebx,0
cmp al,0a
CMOV ebx,edx ;replace jc
add al,bl
add al,dh
may not suffer from branch-penalties, but you see how awful...
Mmmmm... not *that* awful... Wouldn't have run on my recently-deceased K6... (but it's "dead" - in a meaningful sense! :)
But single nibble conversion loops will always be slower than
fix-sized 32 or 64 bit solutions like the dw-conversion I use:
______________________________
;eax [bin] to edx:eax [HEX-ascii]:
; it uses only four registers and no memory
xor edx,edx
xor ebx,ebx
; expand nibbles to bytes:
shld edx,eax,4
shl eax,4
shl edx,4
shld edx,eax,4
shl eax,4
shl edx,4
shld edx,eax,4
shl eax,4
shl edx,4
shld edx,eax,4
shl eax,4
shld ebx,eax,4
shl eax,4
shl ebx,4
shld ebx,eax,4
shl eax,4
shl ebx,4
shld ebx,eax,4
shl eax,4
shl ebx,4
shld ebx,eax,4
shl eax,4
;copy:
mov ecx,ebx
mov eax,edx
;the algo:
add eax,06060606h
add ecx,06060606h
and eax,10101010h
and ecx,10101010h
shr eax,4
shr ecx,4
imul eax,07h
imul ecx,07h
lea eax,[eax+edx+30303030h]
lea edx,[ecx+ebx+30303030h]
;done but for you perhaps wrong ordered yet, so I add:
bswap eax
bswap edx
_________;end
This needs about 45 cycles (incl BSWAP) on AMD,
but is quite long (128 bytes).
I'm curious how long it takes an Intel for it.
P4, in particular, has a reputation of being "really bad" on shifts. I think of myself as an "AMD guy", but I'm running a P4 right now. I haven't done any "timing" on it - haven't even confirmed the weird results Herbert reported. I'll try to "get to it" (if the spirit moves me). I have an idea it won't be good. May need a conditional jump - "if Intel, call the other function"...
I played around with xmm-code, but I found the overhead with
load/store in memory eats all the advantage with PUNPCKLB,...,POR.
Xmm is still on my "learn someday" list. This seems a common story, though. Apparently, xmm is a (big?) win in certain situations where it's "appropriate", but if you need to "force" your application into it, (much?) worse.
Of course, the most likely reason to convert nibbles to hex ascii is "human convenience", and the human can't read 'em nearly as fast as our *slowest* method, so... Still, maybe some other process could use the cycles...
Randy values "clarity". In some cases, I don't think the "clarity" is worth the "detour". In *this* case, the "clarity" of the "obvious" method is probably well worth the three bytes! Sorry I even mentioned "das" (no I'm not - it's fun to discuss this stuff! :)
Best,
Frank
.
- Follow-Ups:
- Re: from elsewhere, an assembler
- From: Wolfgang Kern
- Re: from elsewhere, an assembler
- From: /\\\\o//\\annabee
- Re: from elsewhere, an assembler
- References:
- from elsewhere, an assembler
- From: cr88192
- Re: from elsewhere, an assembler
- From: SpooK
- Re: from elsewhere, an assembler
- From: cr88192
- Re: from elsewhere, an assembler
- From: Betov
- Re: from elsewhere, an assembler
- From: Herbert Kleebauer
- Re: from elsewhere, an assembler
- From: Betov
- Re: from elsewhere, an assembler
- From: Herbert Kleebauer
- Re: from elsewhere, an assembler
- From: Betov
- Re: from elsewhere, an assembler
- From: Herbert Kleebauer
- Re: from elsewhere, an assembler
- From: Betov
- Re: from elsewhere, an assembler
- From: Wolfgang Kern
- Re: from elsewhere, an assembler
- From: /\\\\o//\\annabee
- Re: from elsewhere, an assembler
- From: Wolfgang Kern
- Re: from elsewhere, an assembler
- From: /\\\\o//\\annabee
- Re: from elsewhere, an assembler
- From: Wolfgang Kern
- Re: from elsewhere, an assembler
- From: Frank Kotler
- Re: from elsewhere, an assembler
- From: Wolfgang Kern
- from elsewhere, an assembler
- Prev by Date: Re: NASM source files extensions
- Next by Date: Re: from elsewhere, an assembler
- Previous by thread: Re: from elsewhere, an assembler
- Next by thread: Re: from elsewhere, an assembler
- Index(es):
Relevant Pages
|