pushf v rcl, save, restore carry flag



I'm writing my own large integer library in x86-64 assembly, and, to
start with, I'm working on addition. My basic approach is to loop
through each pair of 64 bit longs and adc them. The one thing I've
noticed is that for this to work I need to save and restore the carry
flag, as, my loop counter sets it too. What's the fastest way to do
that these days? I'm looking at pushf, popf, but, some sites that
I've looked at claims rcr/rcl ought to be faster as of 486 and
pentium. Is this still true?

I'm looking at the AMD64 Architecture guide, Vol 3, and it doesn't
give much about timings at all, although it describes the instructions
reasonably well. Is there a document out there that gives some sort
of an idea of clock ticks for instruction (like the old days), or is
it that today's processors are so massively pipelined that going by a
ticks per instruction isn't going to cut it and you really need to
think in terms of everything else you have going on?

.



Relevant Pages

  • Re: Problem with a script
    ... a loop there becomes impractical. ... You still have them as uniquely named array indexes... ... writing the code twice will only ... reading your entire code and parsing it in their head, ...
    (comp.lang.php)
  • Re: programming language
    ... you will find the source code to my bf interpreter. ... instruction_pointer is the index of the instruction currently being executed in the instruction array. ... execute() is where the action happens. ... executegets a pointer to a bf_vm, where it executes one instruction, increments the instruction pointer of the bf_vm so that it points to the next instruction (or does a loop), and returns. ...
    (comp.programming)
  • Re: Problem with a script
    ... Okay, so variables have unique labels, that doesn't mean they still couldn't be handled in a loop. ... You still have them as uniquely named array indexes... ... I believe that for the new guy this code would be readable, and identifying problems should really not be any more difficult with this, plus I think that it actually might save some time to write the actual code from the beginnig, even though it's not at it's final stage, instead of first writing everything spread out, and then rewriting the same code again cleaned. ... If you expect a person to spend an hour reading your entire code and parsing it in their head, you wont get any help and have to solve the problem by yourself. ...
    (comp.lang.php)
  • Re: How much does it take to execute MMX instruction?
    ... a unrolled loop with lots of nop's in the ... This way we have accurate enough instruction timings. ... Pentium M, in general, has latency one clock cycle less, than Pentium ...
    (comp.lang.asm.x86)
  • Re: IAR MSP430 compiler problem
    ... Does anybody knows how to force compiler to use call instruction ... to next instruction after Spin function..... ... But it doesn't actually co-operate - an eternal loop is not co-operative, even if it you cheat and break out using interrupts. ... Interrupts are inherently asynchronous - if the thread can be suspended by an interrupt function, ...
    (comp.arch.embedded)