SSE2 half as fast as it should be?



I haven't tried it yet, but from what I've read, Simple Integer SSE2
instructions in my 3.8 ghz Prescott CPU have latency 2 and throughput
2. But... the Prescott is supposed to have 2 simple ALU's that are
double pumped yielding 2x2x32 bits = 128 bits per clock cycle of simple
integer instructions. So... is the SSE2 manual wrong, or intel's
architecture (cuz they got the hardware to be twice as fast!)?

Anybody know of workarounds? I am matrix multiplying bits, which
depends on these simple alu instructions.

Thanks,
AndrewF

.



Relevant Pages

  • Re: SSE2 half as fast as it should be?
    ... There is a instruction latency document on ... under Publications. ... instructions in my 3.8 ghz Prescott CPU have latency 2 and throughput ...
    (comp.lang.asm.x86)
  • Re: Opteron versus P4
    ... that this CPU could execute thre FADD instructions in parallel, ... It has throughput 1 for FADD and this means that there is one pipeline ... measure a throughput of 1 per cycle on code that blends these instructions. ...
    (borland.public.delphi.language.basm)
  • Re: Optimized binary search?
    ... All instructions have latency and throughput numbers published in the ... Intel optimization reference manual. ...
    (borland.public.delphi.language.basm)