Re: from elsewhere, an assembler




Frank wrote:

...
[convert nibble to hex-ascii]
cmp al,10
jc +2
add al,7
add al,48

cmp al, 10
sbb al, 69h
das

Shorter, and eliminates the conditional jump... but "das" is so slow
(how slow *is* it?), I don't think it's a "win"...

DAS latency is reported as 8 cycles in AMD-docs

Intel docs describe what it does:

IF ((AL AND 0FH) > 9) OR (AF = 1)
THEN
AL <- AL - 6;
AH <- AH - 1;
AF <- 1;
CF <- 1;
ELSE
CF <- 0;
AF <- 0;
FI;
AL <- AL AND 0FH;

You see it may alter AH as well, which may spoil the game.

DAS is an invalid instruction in 64-bit mode.

What would be your idea of a "fast" way to do it?

IIRC we've seen many variants in the fastest shortes discussion
some time ago in CLAX.

My 8 byte solution (3.5 cycels) wins in the aspect of using
no other registers nor memory. The cc-branch will produce a
penalty if used in a loop (every 9th iteration IIRC).

The short five byte way (10 cycles) and uses AH.

Unfortunately CMOV doesn't have an IMM nor any 8-bit form, so

mov edx,3007h
mov ebx,0
cmp al,0a
CMOV ebx,edx ;replace jc
add al,bl
add al,dh

may not suffer from branch-penalties, but you see how awful...

But single nibble conversion loops will always be slower than
fix-sized 32 or 64 bit solutions like the dw-conversion I use:
______________________________
;eax [bin] to edx:eax [HEX-ascii]:
; it uses only four registers and no memory

xor edx,edx
xor ebx,ebx
; expand nibbles to bytes:
shld edx,eax,4
shl eax,4
shl edx,4
shld edx,eax,4
shl eax,4
shl edx,4
shld edx,eax,4
shl eax,4
shl edx,4
shld edx,eax,4
shl eax,4

shld ebx,eax,4
shl eax,4
shl ebx,4
shld ebx,eax,4
shl eax,4
shl ebx,4
shld ebx,eax,4
shl eax,4
shl ebx,4
shld ebx,eax,4
shl eax,4

;copy:
mov ecx,ebx
mov eax,edx

;the algo:
add eax,06060606h
add ecx,06060606h
and eax,10101010h
and ecx,10101010h
shr eax,4
shr ecx,4
imul eax,07h
imul ecx,07h
lea eax,[eax+edx+30303030h]
lea edx,[ecx+ebx+30303030h]

;done but for you perhaps wrong ordered yet, so I add:
bswap eax
bswap edx
_________;end

This needs about 45 cycles (incl BSWAP) on AMD,
but is quite long (128 bytes).

I'm curious how long it takes an Intel for it.

I played around with xmm-code, but I found the overhead with
load/store in memory eats all the advantage with PUNPCKLB,...,POR.
__
wolfgang



.



Relevant Pages

  • Re: from elsewhere, an assembler
    ... but "das" is so slow ... shld edx,eax,4 ... shl eax,4 ... Of course, the most likely reason to convert nibbles to hex ascii is "human convenience", and the human can't read 'em nearly as fast as our *slowest* method, so... ...
    (alt.lang.asm)
  • Re: from elsewhere, an assembler
    ... > cmp al,10 ... but "das" is so slow ... shld edx,eax,4 ... shl eax,4 ...
    (alt.lang.asm)
  • Re: from elsewhere, an assembler
    ... mov ebx eax ... edi HexPrintString | add edi 7 ... shld edx,eax,4 ... shl eax,4 ...
    (alt.lang.asm)