Cache questions



Hi. I have a question about CPU data caching. (BTW, this is intended
for IA-32 architecture.)

I'm writing some assembly code for a graphics application where
execution speed is the top priority. I'm using the SIMD instruction
sets and the MMX and XMM registers where possible.

I'm writing a routine which will need to read from a source bitmap
(ESI) and write to a destination bitmap (EDI). Other than where
necessary, I'm trying to minimize the memory access since I know it can
be a bottleneck, especially when it causes cache misses.

But actually my knowledge of how the L1/L2 cache works is a little
flaky.

If I run out of registers writing my function and need to spill some of
my local variables to memory, what's the best way to do that? I have
some stack space, and I could write one of my local variables with, for
example "movaps [esp+0x10], xmm0" when I run out of registers. But will
this compete in the cache with the data being read at ESI and written
at EDI?

How can I keep a little L1 cache space for writing and reading my local
variables in this case? At each pixel (each loop iteration) I probably
need to read one or more 32-bit values from the ESI location, write a
32-bit value to the EDI location, and use a few 128-bit memory
addresses to hold some local variables that don't fit in the registers.

And if I can ensure these variables are spilled into the L1 cache, what
kind of time penalty does that invoke, compared to keeping everything
in registers?

Any information or advice welcome.

Thanks.

.



Relevant Pages

  • Re: Cache questions
    ... > sets and the MMX and XMM registers where possible. ... > I'm writing a routine which will need to read from a source bitmap ... especially when it causes cache misses. ... > some stack space, and I could write one of my local variables with, ...
    (comp.lang.asm.x86)
  • RE: Disk vs Tape scenario
    ... With the storage available for me to test it is hard to eliminate cache ... Theoretically a volume using RAID-10 on 8x10K RPM disks can handle ... A write SSCH rate of 475/sec would push the RAID-10 parity group to 950 ... I just tested with two IEBDG jobs writing 20 million 80 byte records ...
    (bit.listserv.ibm-main)
  • Re: need fastest way to write 2gig array to disk file
    ... Eric Taylor wrote: ... > writes seem to go directly to the cache and so a 2 gig output ... > On another similar system, with scsi disks, once the program ... (when writing to the cache, ...
    (comp.os.linux.development.apps)
  • Re: What is the West Wall of the Temple called in Spanish?
    ... Any dialect, even the most dying, uses registers according to the ... Not the only type of Sefardí writing. ... Spanish orthography with the local quirks. ...
    (sci.lang.translation)
  • Re: [opensuse] Hard disc questions Slight OT
    ... speed than outer radius. ... but they pack more data onto the outer tracks than ... Since you are writing to a file, you don't have anyway to say what ... Kernel cache can be even bigger. ...
    (SuSE)