Re: MMX speedup for Floyd Steinberg error diffusion



I figured it out...

I do all my processing in the form of 4 signed WORDS in the MMX
registers, and finally I use packuswb which takes the 4 and packs them
into unsigned bytes with saturation.

The code works exactly as my C++ version did and I have a speedup of
more than 2X !! :)

Excellent!

Thanks all

.



Relevant Pages

  • Re: Combining two MMX registers into one SSE register?
    ... I already had resigned to the fact not to use MMX under ... Can x87 instructions still be used by 64-bit applications? ... About x87 and mmx registers ... Can floating point registers be used in 64-bit Windows? ...
    (comp.lang.asm.x86)
  • Re: the performance of x86 processor and DSP
    ... between FPU and MMX modes? ... the code to save MMX registers at task switch. ... It really looks like Intel cooked up their instruction set additions by ...
    (comp.dsp)
  • Re: MMX instructions and floating point stack
    ... > floating point arithmetic. ... the FPU stack are also the MMX registers. ... So if I load ST0 with 1.0, then issue an MMX instruction, the entire ...
    (comp.lang.asm.x86)
  • Re: 16 byte alignment
    ... Are some of the registers are ... You can use the MMX registers at the same time as the SSE ... Interaction of SSE and SSE2 Instructions with x87 FPU ...
    (borland.public.delphi.language.basm)
  • Re: Interesting.cn
    ... which uses two such registers as ... What I may be remembering is AltiVec vs. MMX, ... "I'd be quite in favour of a military takeover of that benighted city, ...
    (alt.sysadmin.recovery)