Re: MMX speedup for Floyd Steinberg error diffusion




"Maarten Kronenburg" wrote in message

The 4 32-bit words can be stored in a single 128-bit xmm register.
Then you copy and shift left and add/subtract, because
7 = 8 - 1 = 2^3 - 1 and 5 = 4 + 1 = 2^2 + 1 etc.
Then you shift right because 16 = 2^4.
The shift is the PSLLD/PSRLD and the add/subtract is the PADDD/PSUBD.

Now I see the data is in bytes. In that case it seems better to put 16 bytes
in an 128-bit xmm register, then put 8 bytes each time into 8 16-bit words
by shifting and anding, and do the above with PSLLW/PSRLW and PADDW/PSUBW in
the 16-bit words. Then the scaling mentioned is not needed because the upper
8 bits in the 16-bit words should be zero.
Maarten.

.



Relevant Pages