Re: MMX speedup for Floyd Steinberg error diffusion



rep_movsd wrote:
The thing is Floyd-Steinberg dithering works serially one pixel at a
time... Each pixel processed affects the pixel to its right and the
pixels on the next scanline.

Interesting algorithm, very serially dependent. :-(

How does it handle the corner/edge cases?

I.e. the last pixel on a scan line has no down-right neighbor, the first pixel has no down-left and the last scanline has nothing below at all!

So the best I can hope for is to handle one pixels R, G and B byte
values in one go.

Let me clarify with actual code ( highly unoptimal C++ , for clarities
sake )

It seems to me like the getNearestPalColor() function could easily dominate the processing time here!

I guess a 3D space partition would be a good way to go, using an initial lookup table based on something like 3:3:3 bits to start in the proper subvolume, or simply use that table to point to lists of possible palColor candidates for each starting point.

(Due to truncation effects, several pal entries might belong to multiple lists, depending upon the value of the rounded/truncated RGB bits, but with a good set of palette colors, the number of values to test against should be (on average) pretty low.

Anyway, to speed up the dithering stage I would use MMX/SSE and work on multiple lines in parallel, keeping the intermediate results in registers until we're finished with them.

Terje

--
- <Terje.Mathisen@xxxxxxxxxxxxx>
"almost all programming can be viewed as an exercise in caching"

.



Relevant Pages

  • Re: AntiAlias Line Optimization
    ... suggest that is BGRA is the correct byte order for a given pixel. ... to modify the R,G,B values of a DIB given a known x,y point. ... Within the BMData byte array (created by your ModifyDIBArrayHack ... Where ScanLine is the DWord aligned scan-line width of the DIB, ...
    (microsoft.public.vb.winapi.graphics)
  • Re: help with triangle coding
    ... only single fragment per pixel what you propose is more expensive than ... single addition per edge per scanline. ... I look anyone recommending bresenham for any serious scanconversion ...
    (comp.graphics.algorithms)
  • Re: So frustrating... (cross-platform development)
    ... The #1 showstopper IME was the lack of LCL bitmap ScanLine ... So for all other graphics and platforms we would have to mimic the ... querying them pixel by pixel. ... Ppl use a scanline because they have ...
    (borland.public.delphi.non-technical)
  • Re: Attempting to access 1 bit bitmap raw pixel data
    ... For your 2x2 bitmap that's 8 bytes. ... You loop through each scanline ... > After looping through and checking each pixel of the 4 pixel bitmap, ... > PBITMAPFILEHEADER fileHeader = pDataPtr; ...
    (microsoft.public.win32.programmer.gdi)
  • Re: MMX speedup for Floyd Steinberg error diffusion
    ... Each pixel processed affects the ... pixel to its right and the pixels on the next scanline. ... Hmm ... ...
    (comp.lang.asm.x86)