Re: No difference on my machine
- From: Frank Kotler <fbkotler@xxxxxxxxxxx>
- Date: Sun, 11 Nov 2007 15:36:53 GMT
Wolfgang Kern wrote:
Nathan and Frank in discussion about:
[Comparing 'xor eax, eax' with 'mov eax, 0' from the CLAX thread:]
Good one, Nathan! No difference on my P4, that I can notice. Replacing
"dec" with "sub" gives a big speedup, though...
Just tried with "sub" and it took a second longer in all but one run.
There shouldn't be a difference on any x86 except that xor eax,eax
is shorter and this will affect the code alignment after it.
An "align" directive at the top of the loop seems to make little, if any difference... Gotta test more on this...
So a replacment may sometimes (like the example in CLAX) slow down.
I cannot confirm that add/sub-1 is any faster than inc/dec on AMDs.
P4 is weird. I think that's the "point" here - "it depends". One of the weird things about P4 is that cryptic "pause" instruction. I had the impression that a "pause" was supposed to be helpful in a tight loop like this. So I threw one in. Runs the time up to 201 seconds (from 7 - sometimes 8). I guess *that's* not what it's for!!!
It again is of different size and may therefore affect timing like above.
Right. My thinking is that, sooner or later, "small" is going to avoid a cache reload, in any "real world" program, so is generally worth doing. In the case of "dec" vs "sub" - on *this* particular processor - "larger" seems to be "worth it". (about 10 - 11 seconds for "dec", 7 or 8 for "sub")
Curiously, killing Xwindows doesn't seem to make any difference - it frequently does, on these "timing tests". As you've pointed out, unless we're running on "bare metal", we're probably measuring a certain amount of OS "housekeeping" along with our "test code".
I should point out that "gettimeofday" returns microseconds, as well as seconds, so we could easily get greater resolution. And maybe point out that the two dwords are a "structure" - they *must* be kept contiguous. Same for the two dwords in the "tz" structure. Illustrates that we don't need to use the "struc[t]" (or "record") keyword, but we *do* need to know the size of the "clump of bytes" that's being written.
The experimentation continues...
Best,
Frank
.
- Follow-Ups:
- Re: No difference on my machine
- From: Wolfgang Kern
- Re: No difference on my machine
- From: santosh
- Re: No difference on my machine
- References:
- No difference on my machine
- From: Evenbit
- Re: No difference on my machine
- From: Frank Kotler
- Re: No difference on my machine
- From: Evenbit
- Re: No difference on my machine
- From: Wolfgang Kern
- No difference on my machine
- Prev by Date: Re: No difference on my machine
- Next by Date: Re: No difference on my machine
- Previous by thread: Re: No difference on my machine
- Next by thread: Re: No difference on my machine
- Index(es):