Re: Inner loop and out of cache question

From: Robert Wessel (robertwessel2_at_yahoo.com)
Date: 01/31/04


Date: Fri, 30 Jan 2004 23:12:17 +0000 (UTC)

Robert Redelmeier <redelm@ev1.net.invalid> wrote in message news:<UPmSb.746$%n1.211752073@newssvr11.news.prodigy.com>...
> PREFETCH always helps, but sometimes only 2-5%. Maximum
> help is 50% time (double speed) when long calcs (fdiv)
> can run concurrent with long memory fetches.

Incorrect on both counts.

Prefetching can certainly hurt if you do it excessively, or in such a
way that other needed data is cast out of the cache to make room for
the prefetched data, or if you consume bandwidth that could be put to
better uses.

Prefetching can more than double performance as it effectively turns
latency demands into bandwidth demands. If you schedule several
prefetches early enough, they may well all complete in less than
n*(single-access-time). Optimally, proper prefetching could do as
well as reducing each memory request from a complete
processor-to-memory-to-processor latency cycle to whatever a single
memory-controller-to-CPU cycle costs. Practical considerations,
including bus protocol limitations, limitations on numbers of
outstanding concurrent memory requests in various places (CPU,
Northbridge, memory), and limitations on how much prefetchable data
you can actually identify in your program tend to knock that limit
down a fair bit.



Relevant Pages

  • Re: cciss update for 2.4.24-pre1, #3
    ... We found a bug in the ASIC used on the 64xx Smart Array ... If this occurs on a memory boundary the machine will crash. ... This patch turns on prefetch for x86 based systems only. ... Has the prefetching been tested for long? ...
    (Linux-Kernel)
  • Re: Instruction Cache Optimisations
    ... I'm little bit confused about the effectiveness of the memory ... layout achieved by the described algorithm. ... The suggested chains are: ... up on prefetching the start of a function after ...
    (comp.arch)
  • Re: [PATCH 2/2] cciss: disable dma prefetch for P600
    ... falling off into one the holes on IPF and AMD. ... It doesn't happen on Proliant because the last 4kB of memory is ... prefetching was walking off the end of real mmeory and into the AGP region ... There is a bug in the DMA engine that that may result in prefetching ...
    (Linux-Kernel)
  • [PATCH][4/4] mm: Implement swap prefetching tweaks
    ... and prefetch when free memory is greater ... Check that only background priority tasks are running before prefetching. ... +struct prefetch_stats { ... static void clear_current_prefetch_free ...
    (Linux-Kernel)

Loading