Re: MMX speedup for Floyd Steinberg error diffusion
- From: "Maarten Kronenburg" <spamtrap@xxxxxxxxxx>
- Date: Wed, 7 May 2008 22:35:41 +0200
"Maarten Kronenburg" wrote in message
The 4 32-bit words can be stored in a single 128-bit xmm register.
Then you copy and shift left and add/subtract, because
7 = 8 - 1 = 2^3 - 1 and 5 = 4 + 1 = 2^2 + 1 etc.
Then you shift right because 16 = 2^4.
The shift is the PSLLD/PSRLD and the add/subtract is the PADDD/PSUBD.
Now I see the data is in bytes. In that case it seems better to put 16 bytes
in an 128-bit xmm register, then put 8 bytes each time into 8 16-bit words
by shifting and anding, and do the above with PSLLW/PSRLW and PADDW/PSUBW in
the 16-bit words. Then the scaling mentioned is not needed because the upper
8 bits in the 16-bit words should be zero.
Maarten.
.
- Follow-Ups:
- Re: MMX speedup for Floyd Steinberg error diffusion
- From: rep_movsd
- Re: MMX speedup for Floyd Steinberg error diffusion
- References:
- MMX speedup for Floyd Steinberg error diffusion
- From: rep_movsd
- Re: MMX speedup for Floyd Steinberg error diffusion
- From: Maarten Kronenburg
- MMX speedup for Floyd Steinberg error diffusion
- Prev by Date: Re: MMX speedup for Floyd Steinberg error diffusion
- Next by Date: Re: MMX speedup for Floyd Steinberg error diffusion
- Previous by thread: Re: MMX speedup for Floyd Steinberg error diffusion
- Next by thread: Re: MMX speedup for Floyd Steinberg error diffusion
- Index(es):
Relevant Pages
|