Re: MMX speedup for Floyd Steinberg error diffusion
- From: "Maarten Kronenburg" <spamtrap@xxxxxxxxxx>
- Date: Wed, 7 May 2008 20:49:56 +0200
"Maarten Kronenburg" wrote in message
The 4 32-bit words can be stored in a single 128-bit xmm register.
Then you copy and shift left and add/subtract, because
7 = 8 - 1 = 2^3 - 1 and 5 = 4 + 1 = 2^2 + 1 etc.
Then you shift right because 16 = 2^4.
The shift is the PSLLD/PSRLD and the add/subtract is the PADDD/PSUBD.
The reference for these instructions are in:
http://developer.intel.com/products/processor/manuals/index.htm
see instruction set reference.
The timings can be found in:
http://agner.org/optimize/
In addition to this:
In order not to loose the 3 highest bits, the process must be scaled down 3
bits, that is shift right 3 bits, do the thing, and then shift left 3 bits
again. Then of course some shifts cancel out.
Maarten.
.
- References:
- MMX speedup for Floyd Steinberg error diffusion
- From: rep_movsd
- Re: MMX speedup for Floyd Steinberg error diffusion
- From: Maarten Kronenburg
- MMX speedup for Floyd Steinberg error diffusion
- Prev by Date: Re: MMX speedup for Floyd Steinberg error diffusion
- Next by Date: Re: MMX speedup for Floyd Steinberg error diffusion
- Previous by thread: Re: MMX speedup for Floyd Steinberg error diffusion
- Next by thread: Re: MMX speedup for Floyd Steinberg error diffusion
- Index(es):
Relevant Pages
|