ARM926 caching question



Greetings,

this question is for ARM experts, in particular it's about ARM926 core
(which is used in TI's DM6467 DaVinci processor).

I want to use cache for speeding up processing on video buffer of size
YCbCr 4:2:0 1080P (1920x1088x1.5). Normally the buffer is not cached,
since it is shared between ARM code, C64 DSP core and with an
additional PCI master. Data flow is the follows: external PCI master
fills in raw uncompressed frame -> we add several processings (layout
building, background, some graphics and OSD test belnding), then the
whole resulting frame is passed to DSP for compression.

ARM core runs MontaVista Linux 4.0.1 with kernel 2.16.18 (MV-patched
from MontaVista 5.0 distribution).

I'd like to enable caching on ARM for this buffer, process it in
chunks of 4K (D-cache on DM6467's ARM core is 8K 4-way associative, so
I want to leave at least two ways for caching of other program's data
and stack) and then call a kernel module, which will write-back and
invalidate each 4K chunk. So by the end of wbinvd'ing the last chunk
the whole buffer will be consistent in external RAM ready for DSP
processing (obviously, before starting such a processing, the whole D-
cache will have to be invalidated without write-back).

Sounds good, but I see problems with doing so, according to ARM926 TRM
(or may be I just misunderstand).

ARM caches data in 32-byte lines tagged with Modified Virtual Address.
MVA is made by appending a special field FCSE PID in CP15 reg. c13 to
program's virtual address, if that address is below 32M; if the
address is above 32M, no appending takes place and VA = MVA (that's
what happens in kernel mode). User-mode programs are mapped to lower
32M VA and hence use that FCSE PID. I tried to use user-mode pointers
in kernel mode, and got inconsistent data in user-mode buffers;
apparently, the kernel changes FCSE ID on system call entry or just
disables it.

Now the TRM says: "FCSE translation is not applied for addresses used
for entry based cache or TLB
maintenance operations. For these operations VA = MVA." That is, I can
use VA-based cache manipulation CP15 instructions in kernel mode
without caring about FCSE PID. Now if that was true, suppose that
there are currently data from 3 different processes cached for the
same VA, just different PID and we're invalidating cache entry for
that VA via CP15 reg c7, for which "translation is not applied". Which
of the 3 entries above will get invalidated? All of them?

Thanks,
Daniel
.



Relevant Pages

  • Re: ARM926 caching question
    ... this question is for ARM experts, in particular it's about ARM926 core ... I want to use cache for speeding up processing on video buffer of size ... MVA is made by appending a special field FCSE PID in CP15 reg. ... in kernel mode, and got inconsistent data in user-mode buffers; ...
    (comp.arch.embedded)
  • Re: USB mass storage and ARM cache coherency
    ... I think that's possible on ARM too. ... CPUs, one thread triggers a prefetch abort on ... cache maintenance operations weren't visible to the other CPUs. ... Can you issue IPIs as FIQs if needed (from my old ARM knowledge, ...
    (Linux-Kernel)
  • Re: USB mass storage and ARM cache coherency
    ... So we go back to the fix should be done at the individual drivers level. ... If it's going to write into the page cache, it needs to whack the bits. ... I think that's possible on ARM too. ... On ARM11MPCore we flush the caches in flush_dcache_pagebecause the ...
    (Linux-Kernel)
  • Re: USB mass storage and ARM cache coherency
    ... hence the ARM implementation of update_mmu_cachedoesn't flush ... before writing to a page cache page. ... The driver fix is as simple as calling a flush_dcache_pageand I've ... The only exception was an IDE ...
    (Linux-Kernel)
  • file access methods
    ... filesystem cache. ... MMF don't offer any control of how the buffering is done. ... you will see that it enters kernel mode quite often. ...
    (alt.lang.asm)