Re: Cache questions




spamt...@xxxxxxxxxx wrote:
> Hi. I have a question about CPU data caching. (BTW, this is intended
> for IA-32 architecture.)
>
> I'm writing some assembly code for a graphics application where
> execution speed is the top priority. I'm using the SIMD instruction
> sets and the MMX and XMM registers where possible.
>
> I'm writing a routine which will need to read from a source bitmap
> (ESI) and write to a destination bitmap (EDI). Other than where
> necessary, I'm trying to minimize the memory access since I know it
can
> be a bottleneck, especially when it causes cache misses.
>
> But actually my knowledge of how the L1/L2 cache works is a little
> flaky.
>
> If I run out of registers writing my function and need to spill some
of
> my local variables to memory, what's the best way to do that? I have
> some stack space, and I could write one of my local variables with,
for
> example "movaps [esp+0x10], xmm0" when I run out of registers. But
will
> this compete in the cache with the data being read at ESI and written
> at EDI?
>
> How can I keep a little L1 cache space for writing and reading my
local
> variables in this case? At each pixel (each loop iteration) I
probably
> need to read one or more 32-bit values from the ESI location, write a
> 32-bit value to the EDI location, and use a few 128-bit memory
> addresses to hold some local variables that don't fit in the
registers.
>
> And if I can ensure these variables are spilled into the L1 cache,
what
> kind of time penalty does that invoke, compared to keeping everything
> in registers?
>
> Any information or advice welcome.
>
> Thanks.


The best first choice is to use the local variables in stack proximity;
ie ESp or EBP based offsets. Global variables might trask cache but
only for other routines or if they are too big.

Unfortunately there are no logical safe assumptions.
You must experiment and use RDTSC to time things.

.



Relevant Pages

  • Cache questions
    ... sets and the MMX and XMM registers where possible. ... I'm writing a routine which will need to read from a source bitmap ... But actually my knowledge of how the L1/L2 cache works is a little ... my local variables to memory, what's the best way to do that? ...
    (comp.lang.asm.x86)
  • RE: Disk vs Tape scenario
    ... With the storage available for me to test it is hard to eliminate cache ... Theoretically a volume using RAID-10 on 8x10K RPM disks can handle ... A write SSCH rate of 475/sec would push the RAID-10 parity group to 950 ... I just tested with two IEBDG jobs writing 20 million 80 byte records ...
    (bit.listserv.ibm-main)
  • Re: Macros
    ... >> stack?) ... > they do not depend on the sizes or number of local variables (just their ... The CPU is more efficient when it uses the closest L1 cache ... > You can reproduce this scheme for main memory and pagefile: ...
    (microsoft.public.vc.language)
  • Re: need fastest way to write 2gig array to disk file
    ... Eric Taylor wrote: ... > writes seem to go directly to the cache and so a 2 gig output ... > On another similar system, with scsi disks, once the program ... (when writing to the cache, ...
    (comp.os.linux.development.apps)
  • Re: [opensuse] Hard disc questions Slight OT
    ... speed than outer radius. ... but they pack more data onto the outer tracks than ... Since you are writing to a file, you don't have anyway to say what ... Kernel cache can be even bigger. ...
    (SuSE)