Re: which way is faster?




Wannabee skrev:
....
ADD [mem],... is a good example of a READ-MODIFY-WRITE sequence,
and it is faster than its discrete replacement:

MOV eax,[var1]
MOV ebx,[var2]
ADD eax,ebx
MOV [var1],eax

is much slower than:

MOV eax,[var2]
ADD [var1],eax

I mean, single core amd64.
8 cycles I read, inclusive, Using your previously posted technic.
However, there is a diffrence, when changing
variables such as length of timed code, the weather outside,
if the window is open, or having taken a leak before timing,
eg. doing it twice (code just doubled) and the last variant wins by
the magnificent 1 cycle.

A gigantic win :D

And now try it again with the two variables (let's say 4KB) apart
from each other ...

Lets say the 8 cycles are overhead?
:D
(K7 reads 0F cycles - for both, inclusive)

Sure, but only if the two vars share one cache line.
my K7 shows only 8 cycles for the first and 4 cycles for the shorter,
I haven't compared it on AMD64 yet, but it should be equal here.

Why do you give advice on codeoptimization,

Because of the question in the topic ? :)

when it is appear
to me so utterly useless in the real life? If your strategy is not
100% correct at the start of coding, you will have to repeat you rigorous
coding when you rewrite, and you have to TIME EACH CHANGE.

No, I mainly use the info from the manuals beside experience when
I modify or write my code and compare the speed of a whole
function with what I previously noted for it after finished.

Given the errors I see I make each time I do that, and then I need to
re-verify my assumtions at least tree times,
and even then I am not sure I got everything right.....

I think learning ASM instruction together with timing issues
helps a lot on the later work.

Would it not be more fruitful to post more on the strategy of things?

My stategy is 'small/fast/smart', so often just a compromise.

Does anyone really need a fast hexstring to binary routine?
Very fast typers perhaps?

Application code contain many small code parts and 'a few wasted cycles'
here and there may not look 'that' relevant.
But the effect is of multiplying nature ...

Why not post something interessting about AI? You once said that
the main reason that AI was not advanced more, was that ppl wore
"not that bright". Would love to read something you wrote that
conserns AI.....

I just played around with several ideas, but there is no AI-project
on my table yet.
So even some of my OS-features may look like AI, this are all just
automated configuration adjustments on track keeping of users typing
speed or count how often he hit BS,DEL in a text-session and respond
with a funny message if this exceeds his average count per page.

Codeoptimizations are, as Betov said a hundred times, one of the
ill-arguments that work against the assembly community, because
assembly is mostly viewed to have a purpose for this kind of thing.
(Which must be viewed as a very weak argument I figure)

Whats your thoughts on that?

As above, the multiplying effect ...
An ASM-programmer who is aware of timing and instruction size
will always write fast and short code.

For the 100th time I like to repeat that all the problems I have comes
from finding solutions to problems, and not with asm itself.

IF problem CAUSE problem ITERATE IF ??

if you can't 'find' a solution then create one ;)
__
wolfgang



.



Relevant Pages

  • Re: which way is faster?
    ... The code posted is nice but not useful in order to verify the numbers. ... beside that 'mov reg,imm' is faster than 'mov reg,' ... it helped a bit on RD-modify-WR cycles on old VGA-cards. ... that is 65_536_000 dots / sec. or 65 mega pixels. ...
    (alt.lang.asm)
  • Re: which way is faster?
    ... beside that 'mov reg,imm' is faster than 'mov reg,' ... Since I am allowed to rewrite the stack in that case, ... you can calculate the theoretical timing and compare it to the ... it helped a bit on RD-modify-WR cycles on old VGA-cards. ...
    (alt.lang.asm)
  • Re: Time measurement using RDTSC instruction
    ... I implemented my own device driver which generates software interrupt ... The execution time is too small to measure using nornal time function. ... MOV first_highvalue, EDX ... MSDN said that "RDTSC time stamp is the number of clock cycles since ...
    (microsoft.public.development.device.drivers)
  • Re: which way is faster?
    ... For example, I can add to allocated variables, and I can move them to ... addition in the registers is faster. ... MOV eax, ... cycles I read, inclusive, Using your previously posted technic. ...
    (alt.lang.asm)
  • Re: Result of Simple Comparison
    ... xor edx,edx ... mov eax,y ... Total time is ~6 cycles on a K7. ... The cmove equivalent: ...
    (comp.lang.asm.x86)