Re: from elsewhere, an assembler
- From: /\\\\o//\\annabee <Wannabee@xxxxxxxxxxxxxxx>
- Date: Wed, 11 Apr 2007 02:34:23 +0200
På Tue, 10 Apr 2007 23:44:37 +0200, skrev Frank Kotler <fbkotler@xxxxxxxxxxx>:
This needs about 45 cycles (incl BSWAP) on AMD,
but is quite long (128 bytes).
I'm curious how long it takes an Intel for it.
P4, in particular, has a reputation of being "really bad" on shifts. I think of myself as an "AMD guy", but I'm running a P4 right now. I haven't done any "timing" on it - haven't even confirmed the weird results Herbert reported. I'll try to "get to it" (if the spirit moves me). I have an idea it won't be good. May need a conditional jump - "if Intel, call the other function"...
Rosasm hexprint is 5 times faster then Wolgangs code :) ?
I clocked wolfgang at between 666 and 777 cycles and variations (earlier today)
(>800) now.
Hexprint at somewhat above 100 cycles. 145 or thereabouts.
i called hexprint like this:
Betov_Hex:
mov ebx eax
mov ecx 8, edi HexPrintString | add edi 7
std
Do
mov al bl | and al 0F | add al '0'
On al > '9', add al 7
stosb | shr ebx 4
Do_Loop
cld
ret
this adress memory and etc.
Now for the disturbing news. (to me at least)
If I put wolfgangs code, in front of my testcode, a few bytes ahead, it clocks 272 cycles, but if I place it in another TITLE, many many many bytes lower adress, then it clocks in at 800+ cycles.
If I do the same with Betov_Hex I get 598 cycles if I place it at the very much lower adress, and 145 cycles if imidiatly ahead in the code.
I guess this is because of cache?
Anyways, the Betov hexprint is :) faster.
And that one I can read and understand and reuse in two seconds,
whereas Wolfgangs I had to step in the debugger several times,
and I am not sure I get it anyway.
the same thing happens when I place it at lower adresses, just before the testcode (Post code below), but to a lesser degree. I now get 602 cycles for Wolfgangs code
and 374 for Betovs hexprint.
Below is the complete code used in the timings, except for the GUI code.
This code is run in USER mode realtime priority, and runs as the result of
clicking a menuitem:
First listed is the two routines at lower adresses.
Then the testroutine
then the same two routines at higher adresses.
For the 800+ cycles rememeber they use _much_ lower adresses.
Betov_Hex2:
mov ebx eax
mov ecx 8, edi HexPrintString | add edi 7
std
Do
mov al bl | and al 0F | add al '0'
On al > '9', add al 7
stosb | shr ebx 4
Do_Loop
cld
ret
WolfGang_BinToAscci2:
______________________________
;eax [bin] to edx:eax [HEX-ascii]:
; it uses only four registers and no memory
xor edx,edx
xor ebx,ebx
; expand nibbles to bytes:
shld edx,eax,4
shl eax,4
shl edx,4
shld edx,eax,4
shl eax,4
shl edx,4
shld edx,eax,4
shl eax,4
shl edx,4
shld edx,eax,4
shl eax,4
shld ebx,eax,4
shl eax,4
shl ebx,4
shld ebx,eax,4
shl eax,4
shl ebx,4
shld ebx,eax,4
shl eax,4
shl ebx,4
shld ebx,eax,4
shl eax,4
;copy:
mov ecx,ebx
mov eax,edx
;the algo:
add eax,06060606h
add ecx,06060606h
and eax,10101010h
and ecx,10101010h
shr eax,4
shr ecx,4
imul eax,07h
imul ecx,07h
lea eax,D$eax+edx+30303030h
lea edx,D$ecx+ebx+30303030h
;done but for you perhaps wrong ordered yet, so I add:
bswap eax
bswap edx
ret
;;
This is the test/timing code
;;
[TestVariable: ? ? ?]
TestCode:
push edi
CPUID | rdtsc | push eax edx
mov eax 0-1
;call WolfGang_BinToAscci
;call WolfGang_BinToAscci2
call Betov_Hex
;call Betov_Hex2
rdtsc | pop ecx ebx
sub eax ebx
sbb edx ecx
int 3
pop edi
ret
Betov_Hex:
mov ebx eax
mov ecx 8, edi HexPrintString | add edi 7
std
Do
mov al bl | and al 0F | add al '0'
On al > '9', add al 7
stosb | shr ebx 4
Do_Loop
cld
ret
WolfGang_BinToAscci:
______________________________
;eax [bin] to edx:eax [HEX-ascii]:
; it uses only four registers and no memory
xor edx,edx
xor ebx,ebx
; expand nibbles to bytes:
shld edx,eax,4
shl eax,4
shl edx,4
shld edx,eax,4
shl eax,4
shl edx,4
shld edx,eax,4
shl eax,4
shl edx,4
shld edx,eax,4
shl eax,4
shld ebx,eax,4
shl eax,4
shl ebx,4
shld ebx,eax,4
shl eax,4
shl ebx,4
shld ebx,eax,4
shl eax,4
shl ebx,4
shld ebx,eax,4
shl eax,4
;copy:
mov ecx,ebx
mov eax,edx
;the algo:
add eax,06060606h
add ecx,06060606h
and eax,10101010h
and ecx,10101010h
shr eax,4
shr ecx,4
imul eax,07h
imul ecx,07h
lea eax,D$eax+edx+30303030h
lea edx,D$ecx+ebx+30303030h
;done but for you perhaps wrong ordered yet, so I add:
bswap eax
bswap edx
ret
Best,
Frank
--
.
- Follow-Ups:
- Re: from elsewhere, an assembler
- From: Wolfgang Kern
- Re: from elsewhere, an assembler
- From: Betov
- Re: from elsewhere, an assembler
- References:
- from elsewhere, an assembler
- From: cr88192
- Re: from elsewhere, an assembler
- From: SpooK
- Re: from elsewhere, an assembler
- From: cr88192
- Re: from elsewhere, an assembler
- From: Betov
- Re: from elsewhere, an assembler
- From: Herbert Kleebauer
- Re: from elsewhere, an assembler
- From: Betov
- Re: from elsewhere, an assembler
- From: Herbert Kleebauer
- Re: from elsewhere, an assembler
- From: Betov
- Re: from elsewhere, an assembler
- From: Herbert Kleebauer
- Re: from elsewhere, an assembler
- From: Betov
- Re: from elsewhere, an assembler
- From: Wolfgang Kern
- Re: from elsewhere, an assembler
- From: /\\\\o//\\annabee
- Re: from elsewhere, an assembler
- From: Wolfgang Kern
- Re: from elsewhere, an assembler
- From: /\\\\o//\\annabee
- Re: from elsewhere, an assembler
- From: Wolfgang Kern
- Re: from elsewhere, an assembler
- From: Frank Kotler
- Re: from elsewhere, an assembler
- From: Wolfgang Kern
- Re: from elsewhere, an assembler
- From: Frank Kotler
- from elsewhere, an assembler
- Prev by Date: Re: from elsewhere, an assembler
- Next by Date: Re: from elsewhere, an assembler
- Previous by thread: Re: from elsewhere, an assembler
- Next by thread: Re: from elsewhere, an assembler
- Index(es):
Relevant Pages
|