Re: No difference on my machine
- From: "Wolfgang Kern" <nowhere@xxxxxxxx>
- Date: Tue, 13 Nov 2007 02:23:07 +0100
Frank Kotler wrote:
***______this two add to timing value
push edx***______
push eax
Certainly. Got a way to avoid it?
The perhaps new used stack space may not be cached ahead.
I experienced previously [before CPUID/RDTSC] accessed variables
show better reproducable timing values (perhaps just a paging issue).
***_________ only needed prior to first RDTSC (~190..300 ? cycles)
xor eax, eax***_________ should be removed
cpuid
Okay, I'll play with that... I've seen code, from folks who may or may
not know what they're doing, which uses *three* cpuid's before each
rdtsc. My understanding is that one should do it(?). But I *would* have
expected I'd need another one here(?). Are you saying the CPU "stays
serialized" for 190 - 300 cycles?
No, the serialising itself may take 190...+++ cycles,
that's why you may have measured ~380 cycles on an empty test.
Why in hell Intel didn't make rdtsc "serializing" is a mystery to me.
Useless without it, isn't it?
Not at all, sometimes we want to measure the time for serializing
or ie: WBINV
rdtsc***: the RDTSC itself needs time
ie: 11 cycles on a K7, 13 on K8 and AMD64, P4 ??
Okay... three rdtsc's to time rdtsc... :)
I think only two are required, but find the machine constant in TFM ;)
At which point in these 11 or so cycles do we actually "read" the count?
I haven't checked, but isn't this a 'don't care' anyway ?
we measure the difference between two TSC reads ... :)
*** also SUB/SBB your machine constant for RDTSC
and the time for the two PUSHes as well
Okay... if it *is* constant. I though of running "duplicate loops"
(shouldn't call 'em "loops"... "duplicate timed sections"), one "empty"
and one with "code to time", and subtracting one result from the other,
to get a "zero based" number.
I always use the same (4K-aligned, page present) test-field for code
parts timing, so comparisions become more reproducable and reliable.
Now 'edx:eax' should contain zero for an empty test,Right... unless...
(run it twice, because the first run will imply code fetch)
Which would give us a less-than-zero result for an empty test. Okay, run
it three times...
a deviation of +/-1 cannot be avoided due to the micro-steps,
This brings us back to a question that was raised here some time ago:
"Have you ever seen an odd value from rdtsc?" (I have not, I don't think).
Yes, I've seen all possible values.
to not measure background noise (IRQ actions) I'd disable
interrupts for the whole test.
Yeah, perhaps Windows will allow you that option... or allow you to
believe you have that option. :) Probably KESYS... But Linux, no - not
from userland. Really should be doing this on "bare metal"...
In this case a rough estimation of code duration may help to
figure out if it were delayed by interrupt or not.
Don't know if your environment got a single 'RUN-until breakpoint'-key
like KESYS-Hexedit and RosAsm have in their integrated debuggers.
Here it's easy to run the same code piece as often we hit a key
and immediate can see all regs and the time variables.
We should say, "lest the newbies be misled", that it's pretty pointless
to be doing it at all! The performance of an instruction or sequence of
instructions in *this* particular context gives us almost *no*
information about how it'll perform in its "native habitat". We just
wanna see what we *do* see.
Sure, with a "guess where the code will be at runtime"-OS
it's hard to check on true speed in advance :)
But in general the RDTSC-method helps a lot in code parts evalution
and on comparision of algos.
RDTSC is accurate in itself, but the OS may spoil the measurement
and compilers the performance by having the code somewhere else.
Not to forget paging and cache issues which shouldn't be involved
in a RDTSC comparision.
Santosh asked if rdtsc values would be more accurate. My impression is
that rdtsc is "flukey". Herbert posted that example that showed rdtsc
"deltas" going down, the more "nop"s we added... I'll have to dig that
up and look at it again - I don't think I've tried it on my current
hardware.
Oh yeah, a few inserted NOPs may sometimes speed up the whole thing,
I think because NOPs can free CPU resources, you know there is more
than just one EAX on the chip :)
My results so far (really haven't played with it too much) indicate that
two xor's take no cycles at all, but three take four cycles. Ooookay...
Latency and throughput ... depends on what's busy or free.
For "real world" purposes, "gettimeofday" (or equivalent) mak be a more
"meaningful" measure...
I hope my code parts never need 'Seconds' to perform :)
call u64toda[...]
is there no function in L'unix which can display a 64-bit unsigned
integer value as decimal ?
Depends on what you call a "function in L'unix". A system call? ***,
no! Of course "printf" is sitting in memory, waiting for us to call it.
I thought you'd "approve" of getting along without printf! :)
Yes, I really like NoLib-solutions best ;)
Oh, I remember fprint/printf from PowerBasic times ...
KESYS got system calls for all supported variable types, my idea
to once emulate Linux is still alive, but I have no clue yet how
to detect and translate library calls into KESYS system functions,
looks like a horrible job with all the LIB-variants around.
and what is the reason for a 'comma' in there ?
Oh I see it's not a German 'DP' ... ;)
Just to piss off you Europeans. :)
:) be aware of the wind ..
Seriously, that's an example of "code reuse". I originally stole that
from one of the first assembly examples I came across. It was a dos
program which purported to show drive free-space. The program itself was
buggy - supposed to show "default" drive, if no parameter was given, but
it showed drive C, not default. I was tickled to be able to find and fix
this error - I like to feel smart. But I thought the "comma delimited
big number" (originally did just dx:ax, of course) was "cool". So I've
kept it around and "massaged" it, over the years. Haven't touched it for
a long time... but replaced a couple "or"s with "test"s for "this
version" (and stuck that LF in there - which has got to go). Not a
particularly virtuous routine, but I thought it would be "appropriate"
here...
I see.
"Or something..."
Mmh, a new nick to expect? "FK44x86" ?
__
wolfgang
.
- Follow-Ups:
- Re: No difference on my machine
- From: Frank Kotler
- Re: No difference on my machine
- References:
- No difference on my machine
- From: Evenbit
- Re: No difference on my machine
- From: Frank Kotler
- Re: No difference on my machine
- From: Evenbit
- Re: No difference on my machine
- From: Wolfgang Kern
- Re: No difference on my machine
- From: Frank Kotler
- Re: No difference on my machine
- From: santosh
- Re: No difference on my machine
- From: Frank Kotler
- Re: No difference on my machine
- From: Wolfgang Kern
- Re: No difference on my machine
- From: Frank Kotler
- No difference on my machine
- Prev by Date: Re: how can i know the true religion?
- Next by Date: Re: Cost of Windows?
- Previous by thread: Re: No difference on my machine
- Next by thread: Re: No difference on my machine
- Index(es):