Re: from elsewhere, an assembler
- From: "Wolfgang Kern" <nowhere@xxxxxxxxxxx>
- Date: Wed, 11 Apr 2007 17:31:07 +0200
"/\\o//\annabee" <Wannabee@xxxxxxxxxxxxxxx> schrieb im Newsbeitrag
news:op.tqlfvllpmjj8u8@xxxxxxxxxxxxxxx
På Tue, 10 Apr 2007 23:44:37 +0200, skrev Frank Kotler
<fbkotler@xxxxxxxxxxx>:
This needs about 45 cycles (incl BSWAP) on AMD,
but is quite long (128 bytes).
I'm curious how long it takes an Intel for it.
P4, in particular, has a reputation of being "really bad" on shifts. I
think of myself as an "AMD guy", but I'm running a P4 right now. I
haven't done any "timing" on it - haven't even confirmed the weird
results Herbert reported. I'll try to "get to it" (if the spirit moves
me). I have an idea it won't be good. May need a conditional jump - "if
Intel, call the other function"...
Rosasm hexprint is 5 times faster then Wolgangs code :) ?
I clocked wolfgang at between 666 and 777 cycles and variations (earlier
today)
(>800) now.
Hexprint at somewhat above 100 cycles. 145 or thereabouts.
i called hexprint like this:
And I tried this a few minutes ago:
___________________________________
[STDH: 0]
[Time: 0 0]
[HexPrintString: B$ ' ']
main:
_____
;cli ;wont do any good on NT
CPUID |RDTSC |mov D$time eax |mov D$time+4 edx
____________;TEST-AREA insert your code under test here:
;best avoid calls in here or
Betov_Hex2:
mov eax 012345678
mov ebx eax
mov ecx 8 |mov edi HexPrintString | add edi 7
std
L0: mov al bl | and al 0F | add al 030
cmp al 03a | jc L1> |add al 7
L1: stosb | shr ebx 4
Loop L0<
cld
___________
push edx |push eax
RDTSC |sub eax D$time |sbb edx D$time+4 |mov D$time eax |mov D$time+4 edx
pop eax |pop edx
;sti
___________
int3
push 0 |jmp 'KERNEL32.ExitProcess'
______________________________________
This needs reproducable 124 cycles here.
Looks like you just measure windoze background noise.
Now for the disturbing news. (to me at least)
If I put wolfgangs code, in front of my testcode, a few bytes ahead, it
clocks 272 cycles, but if I place it in another TITLE, many many many
bytes lower adress, then it clocks in at 800+ cycles.
First (new) caches and misalignment may spoil the test.
I tested also your way with 'calling' the routines,
and surprise surprise I also got weird results from 250 to 10000 cycles.
This are typical stack fetch penalties (and/or page-fault recovery)
So I added in front of the first RDTSC:
_________
[SDTH: 0]
push 0-11 |call 'KERNEL32.GetStdHandle' |mov D$StdH eax
pushad
popad
_________
just to have some stack already 'as used'
A more reliable comparision is always the direct check of
code parts by reducing windoze noise to a minimum.
If I do the same with Betov_Hex I get 598 cycles if I place it at the
very much lower adress, and 145 cycles if imidiatly ahead in the code.
I guess this is because of cache?
Yes.
Anyways, the Betov hexprint is :) faster.
No, this STOSB-loop takes 124 cycles (136 with call)
My solution need 45 cycles (58 with call)
And that one I can read and understand and reuse in two seconds,
whereas Wolfgangs I had to step in the debugger several times,
and I am not sure I get it anyway.
:)
the algo is easy (done for all 8 bytes):
add 06 ;the upper four bits are clear after the expansion anyway
and 010 ;this bit is set "if >0a"
shr 4 ;make this bit to bit0
mul 7 ;now we get either zero or seven
add ;previous saved + "0 or 7" + '30'
the same thing happens when I place it at lower adresses, just before the
testcode (Post code below), but to a lesser degree. I now get 602 cycles
for Wolfgangs code
and 374 for Betovs hexprint.
As above. Aviod noise measurement ;)
__
wolfgang
.
- Follow-Ups:
- Re: from elsewhere, an assembler
- From: /\\\\o//\\annabee
- Re: from elsewhere, an assembler
- References:
- from elsewhere, an assembler
- From: cr88192
- Re: from elsewhere, an assembler
- From: SpooK
- Re: from elsewhere, an assembler
- From: cr88192
- Re: from elsewhere, an assembler
- From: Betov
- Re: from elsewhere, an assembler
- From: Herbert Kleebauer
- Re: from elsewhere, an assembler
- From: Betov
- Re: from elsewhere, an assembler
- From: Herbert Kleebauer
- Re: from elsewhere, an assembler
- From: Betov
- Re: from elsewhere, an assembler
- From: Herbert Kleebauer
- Re: from elsewhere, an assembler
- From: Betov
- Re: from elsewhere, an assembler
- From: Wolfgang Kern
- Re: from elsewhere, an assembler
- From: /\\\\o//\\annabee
- Re: from elsewhere, an assembler
- From: Wolfgang Kern
- Re: from elsewhere, an assembler
- From: /\\\\o//\\annabee
- Re: from elsewhere, an assembler
- From: Wolfgang Kern
- Re: from elsewhere, an assembler
- From: Frank Kotler
- Re: from elsewhere, an assembler
- From: Wolfgang Kern
- Re: from elsewhere, an assembler
- From: Frank Kotler
- Re: from elsewhere, an assembler
- From: /\\\\o//\\annabee
- from elsewhere, an assembler
- Prev by Date: Re: NASM CHM Getting Updates
- Next by Date: Re: from elsewhere, an assembler
- Previous by thread: Re: from elsewhere, an assembler
- Next by thread: Re: from elsewhere, an assembler
- Index(es):
Relevant Pages
|
|