Re: filling big array of double



> Test result on P4 2.8 GHz HT:
> - Pascal: 1469 ms
> - Aleksandr's last IA32: 1468 ms
> - Eric's FPU: 4282 ms
>
> The version using FPU is nearly 3 times slower...

I tried the MMX version now, which is pretty fast for me (three runs):
891
832
862

Aleksandr's IA32 function performs rather poorly for me:
1542
1552
1552

This is worse than Eric's ASM FPU version and his own MMX version.

I think FPU and MMX may be improvable by making sure memory is written
with 8-byte alignment. This may make them even faster.
.



Relevant Pages

  • Re: Pentium 4 - Register TRANSLATED
    ... > Hab im Netz vor geraumer Zeit ne Seite gefunden, ... > zusammen ein Bild). ... It was quite obviuos from it that, for example, MMX and FPU ...
    (comp.lang.asm.x86)
  • Re: the performance of x86 processor and DSP
    ... between FPU and MMX modes? ... the code to save MMX registers at task switch. ... It really looks like Intel cooked up their instruction set additions by ...
    (comp.dsp)
  • Re: FastMM.. any work in progress?
    ... If such a routine calls the memory manager and the memory manager uses MMX then that FPU parameter will be destroyed, since you cannot use the FPU and MMX at the same time. ...
    (borland.public.delphi.language.basm)
  • Re: Limited Multi-Precision
    ... > Does anyone know of some QUAD precision integer ... On the original pentium (or MMX) you want to use the FPU for this. ...
    (sci.crypt)
  • Re: the performance of x86 processor and DSP
    ... I've seen somewhere that Intel made blunder with the MMX, ... between FPU and MMX modes? ... as separate registers, they are aliased to the registers in the FPU data ... FPU Architecture"." ...
    (comp.dsp)