Re: Fast mip-map generation
- From: Terje Mathisen <spamtrap@xxxxxxxxxx>
- Date: Mon, 22 Oct 2007 15:33:23 +0200
Nordlöw wrote:
Does anyone have any SSE/SSE2-optimized code for generation of an
image mip-map (image resolution pyramid)? I need this to do high-
performance (line-oriented preservering image locality):
1. Image Load from file to memory
2. Mip-Map Generation of this memory
3. Upload mip-map into an OpenGL texture
This is easy, it just depends on how fancy you want your subsampling/filtering to be!
Point sampling is out of the question, right?
Do you need to handle gamma correction, or are your source textures linear?
The next step is linear averaging, where you simply take 0.25*sum(four_source_pixels), possibly after ungamma-conversion of each of the source pixels.
Next is a more fancy sub-sampler, something like (-1,3,3,-1). If you do go this route, then you should probably first convert the source texture to linear (32-bit fp) format, then generate all the reduced size texture before converting back to whatever format you want your texture to be in.
Let's assume a 4x4 sub-sampler, with edge handling done via padding (border value replicated to one extra pixel to the left/right/above/below), and all lines starting on a 16-byte boundary:
First version:
load four source pixels (in RGBA format) into xmm0..xmm3.
multiply each register by the required sampling weight (if different from 1)
for the next three lines:
Load the next four pixels into xmm4..xmm7, scale and
accumulate into xmm0..xmm3
Scale the result (or pre-scale the weights!)
Write output pixel
This is 16 MOVUPS, 8 MULPS, 12 ADDPS and a final MULPS to scale:
About 40 cycles on a Core 2 Duo, or about 7 ms for a full 1Kx1K texture with 10 MIP levels on a 2 GHz cpu.
Converting to/from linear fp format is going to take comparable time!
Second version:
This optimization depends on re-using the scaled input texels: Since the sampling kernel is symmetrical, each weighted sample is used twice, so we can save half the loads and half the muls this way, but it will increase the register pressure.
Terje
--
- <Terje.Mathisen@xxxxxxxxxxxxx>
"almost all programming can be viewed as an exercise in caching"
.
- References:
- Fast mip-map generation
- From: Nordlöw
- Fast mip-map generation
- Prev by Date: Fast mip-map generation
- Next by Date: Re: The history of Structure capabilities
- Previous by thread: Fast mip-map generation
- Index(es):
Relevant Pages
|