Re: Volatile variables

From: Chris Torek (nospam_at_torek.net)
Date: 02/27/04


Date: 27 Feb 2004 16:37:07 GMT


>In <ff8ef364.0402270008.36b416b0@posting.google.com>
>srinivasreddy_m@yahoo.com (srinivas reddy) writes:
[snippage]
>>I remember somebody mentioning a scenario involving L1, L2 caches.
>>Could anybody throw some light on this?

In article <news:c1nmej$pbf$15@sunnews.cern.ch>
Dan Pop <Dan.Pop@cern.ch> writes:
[a lot of correct answers to stuff I snipped, then:]
>Whoever mentioned such a scenario was heavily confused and in dire need
>of a clue. The abstract C machine has no caching, therefore caching is
>irrelevant to the correct behaviour of a C program.

I would not go so far as to say *this*. The second sentence is
true but does not imply the first, because the program might not
be written in Standard C after all.

In particular, one place one commonly abandons Standard C in order
to get actual work done :-) on real machines has to do with device
drivers, where the "volatile" keyword is also heavily used. Device
drivers tend to "do I/O" (reading input and generating output is
often required to get work done), and some machines provide fast
I/O methods ("DMA" and the like) that completely bypass the CPU.

If a CPU has an on-chip cache[1], and if DMA bypasses the CPU
entirely[2], then DMA bypasses the on-chip cache. As it happens,
on-chip CPU caches generally come in one of two flavors, called
"write-through" and "write-back". In the case of a write-through
cache, DMA *output* (from memory to device) does not require any
special action, because data in the CPU cache is always also in
memory (this is the property that makes the cache "write-through").
When the device obtains the output-data from memory, it gets the
desired values. DMA *input*, however, has a problem; and with
write-back caches, even DMA output has the same problem: the data
in the CPU cache can differ from that in memory, before and/or
after the device's DMA transaction. To obtain correct co-operation
between the device and the CPU we use steps called "cache flushing".
(In the abstract model I implemented for BSD, we always do this
twice for every DMA transaction: one "pre-op" and one "post-op",
supplying flags as to whether the op is read, write, or both.)

Again, just as Dan Pop said, all of this is outside the model we
use in ANSI/ISO C (the "abstract machine") -- but it does occur in
"real world" C programming, in a place where the "volatile" keyword
is used quite a lot.

[1] Most do these days; some have multiple levels of on-chip cache.

[2] Some do, some do not; some CPUs even have bugs in the DMA
snooping hardware. Sometimes some DMA goes through some caches
and bypasses others. Some of the more byzantine architectures have
multiple levels of I/O adapters, which have their own memory-interaction
issues. Making devices "upcall" to their adapters to announce
"intent to do I/O" and "finished doing I/O" removes all a lot of
"hair" from the drivers; the adapters do any setup or teardown
required and continue to push the call up their own chain until it
reaches a level that is "all-knowing".

-- 
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W)  +1 801 277 2603
email: forget about it   http://web.torek.net/torek/index.html
Reading email is like searching for food in the garbage, thanks to spammers.


Relevant Pages

  • Re: FPGA-based hardware accelerator for PC
    ... I know that in most cases the CPU ... that it contsins no cache, as BRAMs are too precious resources to be wasted ... The BRAMs are what define the opportunity, ... many threads with full associativity of memory lines using hashed MMU ...
    (comp.arch.fpga)
  • Re: Cost of calling a standard library function
    ... It accesses/reads memory using esi 4 ... > safly move it within the cache, without having to go via ebx. ... try it the same thing on a different earlier CPU, ... should check it out...for "tight inner loop" stuff, ...
    (alt.lang.asm)
  • Re: What can I check to fix system performance?
    ... it seems you have plenty of memory available: ... copies of files you have read of written lately, in a cache, in case ... processes per CPU, or 40 in all. ... Consider the disk structure. ...
    (comp.os.linux.setup)
  • Re: MontaVista Linux and Virtex-II & 4
    ... |> cache cpu in the kernel) ... ... I also could come up with a system e.g. requiring non-cacheable memory ... area is not able to support a cache coherency protocol. ...
    (comp.arch.fpga)
  • Re: A simple question about DMA, please help me.
    ... held by the DMA controller and the CPU is set idle until this transfer ... memory to fetch instructions while the DMA transfer is continuing. ... The PCI bus changed that -- it eliminated the separate lines for each ...
    (comp.lang.asm.x86)