Re: Rotating bitmaps, fast..



Johan Stäck wrote:
Hello all!

I have written a routine in C++ that rotates bitmaps 90 degrees.
(I call it from VB6)
Optionally, it will also perform a "flip" (mirroring).

Now, I want to squeeze as much performance as possible out of this code.
(I rotate a number of live video streams)

At http://www.lightweightvideo.com/rotate90/Rotate90_code.htm I have put
up the C++ code, and also a disassembly listing.

Is there anything obvious in the disassembly that indicates that the
compiler has done a bad job? (and that could be corrected in some way...)

Or (perhaps more likely...) is there anything obvious in the C++ code
that indicates that *I* have done a bad job?

Is there perhaps even someone, well experienced in Intel assembler, who
would like to write the necessary code to perform the rotation loop in
the most efficient way?
(Given of course that the disassembly listing shows that it *can* be
done more efficient...) Myself, I can' determine that..


The compiler has done a reasonable job of optimizing the code with most
of the complex calculations hoisted out of the inner loop and/or
reduced to simple adds.

The major flaw in the code (from a performance perspective), is that
you've left a conditional test inside the inner loop. Rewrite this so
that you test "flip" first and then select one of two complete copy
routines. IOW:

if (flip)
for( int line_in = 0; line_in < height_in; line_in++ )
{
line_pitch_in = line_in*Pitch_in;
for( int pixel_in = 0; pixel_in < width_in; pixel_in ++ )
{
p_in = pBuf_in + line_pitch_in + (pixel_in * 3);
line_ut = pixel_in;
pixel_ut = height_in - line_in; // flip
p_ut = pBuf_ut + (line_ut *Pitch_ut) +(pixel_ut * 3);
p_ut[0] = p_in[0];
p_ut[1] = p_in[1];
p_ut[2] = p_in[2];
}
}
else
for( int line_in = 0; line_in < height_in; line_in++ )
{
line_pitch_in = line_in*Pitch_in;
for( int pixel_in = 0; pixel_in < width_in; pixel_in ++ )
{
p_in = pBuf_in + line_pitch_in + (pixel_in * 3);
line_ut = pixel_in;
pixel_ut = line_in; // no flip
p_ut = pBuf_ut + (line_ut *Pitch_ut) +(pixel_ut * 3);
p_ut[0] = p_in[0];
p_ut[1] = p_in[1];
p_ut[2] = p_in[2];
}
}

Once you've done that, you might think about some cache blocking, and
doing the rotation in smaller chunks than the whole bitmap. Rotating
and moving square subblocks sized to cache well would likely help.
Perhaps 16x16, which would make both source and target subblocks three
cache lines wide (assuming a 16b cache line), which would hopefully
allow you to move and rotate the entire subblock with only 16*3*2 cache
lines read from memory, and half that number written back, and without
ever having to go back and reaccess a cache line you've touched on a
prior subblock.

.



Relevant Pages

  • Re: Rotating bitmaps, fast..
    ... it will also perform a "flip". ... and moving square subblocks sized to cache well would likely help. ... which would make both source and target subblocks three ... allow you to move and rotate the entire subblock with only 16*3*2 cache ...
    (alt.lang.asm)
  • Re: Rotating bitmaps, fast..
    ... it will also perform a "flip". ... doing the rotation in smaller chunks than the whole bitmap. ... and moving square subblocks sized to cache well would likely help. ... which would make both source and target subblocks three ...
    (alt.lang.asm)
  • Re: GC in Jons raytracing benchmark
    ... >> doing it like this puts a lot of strain on the memory subsystem. ... 128kB L1 data cache. ... if people like they can think of the RAM chips being ... > and so delay the flip as long as you can without spilling out to the ...
    (comp.lang.lisp)
  • Re: Balanced Binary Tree
    ... > I want this cache to run fast, so i've used, in my C program, a bb-tree. ... % Right rotate ... Right, Level), Level1),!. ... bbtreefind(Key, Left, Flag). ...
    (comp.lang.prolog)
  • clearing browser cache
    ... advice that people need to routinely go and clear the ... Certainly the browser software knows enough ... to rotate out old files to make room for new ones. ... that about the cache being full and needing emptying. ...
    (microsoft.public.windows.inetexplorer.ie6.browser)