Re: arm9 memory throughput



CBFalconer <cbfalconer@xxxxxxxxx> writes:

Nils wrote:

... snip ...

Doing the same using DMA I get numbers around 1.3Gb/s on the same
system.

I know that I never get the full theoretic memory throughput but
200mb/s is a lot less than we have expected. Now I want to
understand why this happends. Unfortunately I know s**t about
memory interfaces, memory latencies and all the other stuff.

Could somone please explain me what the memory and CPU does
between the writes?

Let's assume a simple testing mechanism. The actual assembly code
will be something like:

call recordtime
mov r1, #I; number of tests to apply
mov r2, A1; starting address to use
mov r3, #0; initialize counter
; start of loop
lp: mov f4, r2+r3; where to write
mov (r4), #0; what we are measuring!!!
inc r3
cmp r3, r1
jnz lp; do it again
; end of loop
call recordtime
call computeanddisplay

Now look at the work done within the loop compared to the writes.
Each instruction requires a memory read just to access it. There
are 5 of these. At best the COU requires no time to execute
things, in which case there is already a 6 : 1 reduction in writing
speed from memory access speed.

Smart use of caches etc. can improve this ratio. It will never
become 1. And any such improvement costs money.

That ignores the store-multiple instruction and, probably, the fact
that the ARM9 has an instruction cache and harvard architecture
(internally). So it should be entirely possible to saturate the memory
bus with writes.

--

John Devereux
.



Relevant Pages

  • GVBE02 - Graphics: VESA BIOS Extensions Part I I
    ... which is used to request a memory block from the system. ... ax dosmem_segment #) mov \ save real mode segment ... The next step is to access the video card to see if VESA is even ...
    (comp.lang.forth)
  • Direct video memory access in GForth - HowTo???
    ... have not been able to find out how to access the VGA video memory ... In GForth, I haven't seen anything that appears to allow a Selector ... VSelector # ax mov ...
    (comp.lang.forth)
  • Re: Memory problem
    ... I have two memory blocks p and q, both n=8192 bytes, allocated using C's ... malloc routine. ... mov,eax ...
    (comp.lang.asm.x86)
  • Re: increased L1 instruction cache access
    ... should have same number of L1 instruction cache accesses. ... either memory allocation or deallocation. ... This is because the overall cycles (sum across two cores) for parallel ...
    (comp.programming.threads)
  • memory mapping help - oracle stack dumps
    ... However we keep seeing crashes for the oracle dispatcher process, ... way I can identify where this "memory" is, ... Below is the core dump and the line which the core dump references each ... chunk_free+234 mov 0x8,%edx ...
    (Linux-Kernel)