Re: pushf v rcl, save, restore carry flag



On Feb 24, 2:25 pm, stork <Todd.Bandrow...@xxxxxxxxx> wrote:
I'm writing my own large integer library in x86-64 assembly, and, to
start with, I'm working on addition. My basic approach is to loop
through each pair of 64 bit longs and adc them. The one thing I've
noticed is that for this to work I need to save and restore the carry
flag, as, my loop counter sets it too. What's the fastest way to do
that these days? I'm looking at pushf, popf, but, some sites that
I've looked at claims rcr/rcl ought to be faster as of 486 and
pentium. Is this still true?


You should review your loop implementation, since you can usually
implement that sort of loop without stomping on the carry flag. For
example, inc/dec and lea can be used to adjust counters and pointers
without altering the carry flag. You'll almost certainly want to
unroll the loop some too (which will almost certainly be a bigger win
than anything else).

Anyway, assuming you have to do a single iteration loop, and you can't
avoid trashing the carry flag, your best bet is probably to avoid
trying to actually save and restore the carry, just save the carry for
input to the next cycle in a register. Let's say you use rdx, so
before the loop you'd load rdx with zero, and in the loop you'd do
something like:

mov rax,0
add rdx,num1[rsi] ;prior carry
adc rax,0
add num2[rdi],rdx
adc rax,0
mov rdx,rax ;new carry for next round

That can be optimized a bit too, of course.

OTOH, this sort of optimization always needs to be tested, it's far to
easy to get it wrong given how complex the execution of instructions
is in modern CPUs.
.



Relevant Pages

  • Re: Examples of Anti-Anti-Alias Requirements
    ... The conservative rule for the stability would be having a unity loop gain at 1/. ... ADC creates a delay as long as 10...20 periods of the output sample rate. ... "The rights of the best of men are secured only as the ...
    (comp.dsp)
  • Re: Something to ponder
    ... >This is a crude (or elegant depending on how you look at it) delta sigma ... As for this being crude, I take it more as "good enough". ... Even if the state of TooHi changes the loop time a little, ... Others have an ADC so maybe they don't suggest ways to not use the ADC. ...
    (sci.electronics.design)
  • Re: Fibonacci
    ... There's lots of ways to generate fib() values, ... via the linear ) to the constant ). ... Maybe something like this for the inner loop? ... adc edi,edx ...
    (comp.lang.asm.x86)
  • Re: Examples of Anti-Anti-Alias Requirements
    ... The conservative rule for the stability would be having a unity loop gain at 1/. ... the oversampling makes it easier to skimp on or dispense with an anti-alias filter. ... ADC creates a delay as long as 10...20 periods of the output sample rate. ...
    (comp.dsp)
  • Re: Why is C# 450% slower than C++ on nested loops ??
    ... The posted benchmark was crucial to ... > compilers generate for the loop and get over with it. ... > additions in the outer loops, which the C# compiler doesn't. ... gotten around to implementing every possible optimization in every language, ...
    (microsoft.public.dotnet.languages.csharp)