Re: Fast mip-map generation



Nordlöw wrote:
Does anyone have any SSE/SSE2-optimized code for generation of an
image mip-map (image resolution pyramid)? I need this to do high-
performance (line-oriented preservering image locality):
1. Image Load from file to memory
2. Mip-Map Generation of this memory
3. Upload mip-map into an OpenGL texture

This is easy, it just depends on how fancy you want your subsampling/filtering to be!

Point sampling is out of the question, right?

Do you need to handle gamma correction, or are your source textures linear?

The next step is linear averaging, where you simply take 0.25*sum(four_source_pixels), possibly after ungamma-conversion of each of the source pixels.

Next is a more fancy sub-sampler, something like (-1,3,3,-1). If you do go this route, then you should probably first convert the source texture to linear (32-bit fp) format, then generate all the reduced size texture before converting back to whatever format you want your texture to be in.

Let's assume a 4x4 sub-sampler, with edge handling done via padding (border value replicated to one extra pixel to the left/right/above/below), and all lines starting on a 16-byte boundary:

First version:

load four source pixels (in RGBA format) into xmm0..xmm3.
multiply each register by the required sampling weight (if different from 1)
for the next three lines:
Load the next four pixels into xmm4..xmm7, scale and
accumulate into xmm0..xmm3
Scale the result (or pre-scale the weights!)
Write output pixel

This is 16 MOVUPS, 8 MULPS, 12 ADDPS and a final MULPS to scale:
About 40 cycles on a Core 2 Duo, or about 7 ms for a full 1Kx1K texture with 10 MIP levels on a 2 GHz cpu.

Converting to/from linear fp format is going to take comparable time!

Second version:

This optimization depends on re-using the scaled input texels: Since the sampling kernel is symmetrical, each weighted sample is used twice, so we can save half the loads and half the muls this way, but it will increase the register pressure.

Terje
--
- <Terje.Mathisen@xxxxxxxxxxxxx>
"almost all programming can be viewed as an exercise in caching"

.



Relevant Pages

  • Re: Next Version of GDI+ (maybe GDI+ 2.0)
    ... second is that even if you want only to see the image, you need to load all of it's pixels. ... smaller sample for display. ... BEFORE loading it. ...
    (microsoft.public.dotnet.framework.drawing)
  • Re: Your favorite eBay "coin peeve?"
    ... the coin and read the text. ... so load time isn't a factor ... It's a pain, though, to have scroll around an ad to see it. ... I think an image of from 400 pixels wide to 600 pixels wide is as ...
    (rec.collecting.coins)
  • Re: Two simple meshes - memory gone
    ... I load two meshes, tiger.x and bigship1.x (both ... calling a method from an invalid ID3DXBuffer pointer if the model fails to ... name, but there can be materials that don't have a texture mapped to them, ...
    (microsoft.public.win32.programmer.directx.graphics)
  • Re: Compressed textures
    ... then load the texture into a plain surface in system memory and use ... When a movie is small enough, I readit into system memory as a binary stream defining the texels of different frames. ... Whenever I need a new video frame in the texture, I lock the texture, write the bytes and unlock it. ...
    (microsoft.public.win32.programmer.directx.graphics)
  • Re: Next Version of GDI+ (maybe GDI+ 2.0)
    ... You don't need to load the full image to apply an effect on it. ... You do thinks sequentially, you read a bunch of pixels, process them, ... > loading and manipulating of such files isn't lightning fast unless you ... > smaller sample for display. ...
    (microsoft.public.dotnet.framework.drawing)