Re: which way is faster?



På Thu, 10 Jan 2008 12:35:06 +0100, skrev Wolfgang Kern <nowhere@xxxxxxxx>:


shikamuk asked:

Hello
I wonder which way is faster while doing arithmetical expressions.
For example, I can add to allocated variables, and I can move them to
the registers and then add.
Probably, addition in the registers is faster.
But what about time to move them to the registers?

Every memory access takes its time, also if already cached or on stack.

ADD [mem],... is a good example of a READ-MODIFY-WRITE sequence,
and it is faster than its discrete replacement:

MOV eax,[var1]
MOV ebx,[var2]
ADD eax,ebx
MOV [var1],eax

is much slower than:

MOV eax,[var2]
ADD [var1],eax

I mean, single core amd64.
8 cycles I read, inclusive, Using your previously posted technic.
However, there is a diffrence, when changing
variables such as length of timed code, the weather outside,
if the window is open, or having taken a leak before timing,
eg. doing it twice (code just doubled) and the last variant wins by
the magnificent 1 cycle.

A gigantic win :D

Lets say the 8 cycles are overhead?

:D

(K7 reads 0F cycles - for both, inclusive)

Why do you give advice on codeoptimization, when it is appear
to me so utterly useless in the real life? If your strategy is not
100% correct at the start of coding, you will have to repeat you rigorous
coding when you rewrite, and you have to TIME EACH CHANGE.
Given the errors I see I make each time I do that, and then I need to
re-verify my assumtions at least tree times,
and even then I am not sure I got everything right.....

Would it not be more fruitful to post more on the strategy of things?
Does anyone really need a fast hexstring to binary routine?
Very fast typers perhaps?

Why not post something interessting about AI? You once said that
the main reason that AI was not advanced more, was that ppl wore
"not that bright". Would love to read something you wrote that
conserns AI.....

Codeoptimizations are, as Betov said a hundred times, one of the
ill-arguments that work against the assembly community, because
assembly is mostly viewed to have a purpose for this kind of thing.
(Which must be viewed as a very weak argument I figure)

Whats your thoughts on that?

For the 100th time I like to repeat that all the problems I have comes
from finding solutions to problems, and not with asm itself.

but it also depends on where you want the result
ie:

MOV eax,[var2]
ADD eax,[var1]

is a few micro-cycles faster than with [mem] as result destination.

__
wolfgang






--
http://www.youtube.com/watch?v=pZ6zzE8JUGY
.



Relevant Pages

  • Re: which way is faster?
    ... I wonder which way is faster while doing arithmetical expressions. ... For example, I can add to allocated variables, and I can move them to ... addition in the registers is faster. ... MOV eax, ...
    (alt.lang.asm)
  • Re: which way is faster?
    ... I wonder which way is faster while doing arithmetical expressions. ... For example, I can add to allocated variables, and I can move them to ... addition in the registers is faster. ... MOV eax, ...
    (alt.lang.asm)
  • Re: which way is faster?
    ... The code posted is nice but not useful in order to verify the numbers. ... beside that 'mov reg,imm' is faster than 'mov reg,' ... it helped a bit on RD-modify-WR cycles on old VGA-cards. ... that is 65_536_000 dots / sec. or 65 mega pixels. ...
    (alt.lang.asm)
  • Re: [RFC] full suspend/resume support for i915 DRM driver
    ... Greg, DRM desperately needs review of its device model usage, can you ... struct drm_sysfs_class; ... * This is the number of cycles out of the backlight modulation cycle for which ... * 855 scratch registers. ...
    (Linux-Kernel)
  • Re: Lies, damn lies and benchmarks
    ... When running using just the 16-bit registers, ... extra cycles when run on the 386 over the 286 (these were mostly system ... instructions which didn't get run too often anyways), ... The FPU was another story, the 287 FPU was usually run at an asynchronous ...
    (comp.security.misc)