Cache questions
- From: spamtrap@xxxxxxxxxx
- Date: Fri, 13 May 2005 21:23:11 +0000 (UTC)
Hi. I have a question about CPU data caching. (BTW, this is intended
for IA-32 architecture.)
I'm writing some assembly code for a graphics application where
execution speed is the top priority. I'm using the SIMD instruction
sets and the MMX and XMM registers where possible.
I'm writing a routine which will need to read from a source bitmap
(ESI) and write to a destination bitmap (EDI). Other than where
necessary, I'm trying to minimize the memory access since I know it can
be a bottleneck, especially when it causes cache misses.
But actually my knowledge of how the L1/L2 cache works is a little
flaky.
If I run out of registers writing my function and need to spill some of
my local variables to memory, what's the best way to do that? I have
some stack space, and I could write one of my local variables with, for
example "movaps [esp+0x10], xmm0" when I run out of registers. But will
this compete in the cache with the data being read at ESI and written
at EDI?
How can I keep a little L1 cache space for writing and reading my local
variables in this case? At each pixel (each loop iteration) I probably
need to read one or more 32-bit values from the ESI location, write a
32-bit value to the EDI location, and use a few 128-bit memory
addresses to hold some local variables that don't fit in the registers.
And if I can ensure these variables are spilled into the L1 cache, what
kind of time penalty does that invoke, compared to keeping everything
in registers?
Any information or advice welcome.
Thanks.
.
- Follow-Ups:
- Re: Cache questions
- From: spamtrap
- Re: Cache questions
- From: randyhyde@xxxxxxxxxxxxx
- Re: Cache questions
- Prev by Date: Re: X86(Ia32) Asm Future
- Next by Date: Re: 8085 multiply?
- Previous by thread: Intel 80386EX interrupt service procedure
- Next by thread: Re: Cache questions
- Index(es):
Relevant Pages
|