Re: Atmel releasing FLASH AVR32 ?




"Ulf Samuelsson" <ulf@xxxxxxxxxxxxx> wrote in message news:etoeuu$ea4$3@xxxxxxxxxxx
"tesla" <yusufilker@xxxxxxxxx> skrev i meddelandet
news:1174331419.729975.262570@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

Top MHz still seems to be Flash constrained - we've been stuck in the
50-100MHz zone
for what seems like years....

I am having difficulty to understand why they cannot read 10
instructions at 50 Mhz from flash in parallel and run processor at 500
Mhz.

There is no point if you branch every few instructions as most programs do...

Branch operatons (unpredictable PC changes depending on user input
etc) will still run at 50 mhz but it is still a good gain..

Not at all. If you run a CPU at 500MHz but branches take 10 cycles then
you're lucky if you get the performance of a 150MHz CPU with 5 times
the power consumption...

The solution is to use a cache and branch prediction.

Maybe the sense amplifiers for the flash are large, or draw a lot of
current.
I still remember page mode DRAM memories with 4096 bits per page,
and noone has been able to tell me why this is not possible with flash.

Even if it were feasible, a cache with 1 line of 512 bytes is totally useless.
A fully associative cache with 32 lines of 16 bytes would be better, but
likely still too small to be useful (about 4KB is the absolute minimum).
Combining prefetch with a branch target instruction cache would make
even better use of such a small cache.

After talking to multiple memory companies about this, my
conclusion is that memory people do not understand microprocessors
and their needs.

Most memory (flash, DRAM, even SRAM) is optimised for density, not
speed. RLDRAM attracts a premium, so is rarely used. Hopefully new
technologies like MRAM will become mainstream soon.

Wilco


.



Relevant Pages

  • Re: Superstitious learning in Computer Architecture
    ... Without a LOT of logic or some other better approach, re-executing the instructions requires re-decoding and it ties up the cache memory bus transferring more data as instructions than the instructions are working on. ... There is most of an order of magnitude in speed sacrificed by even HAVING a cache in a single ALU system, and more than an order of magnitude in multiple-ALU systems! ...
    (comp.arch.arithmetic)
  • Re: Hyperthreading vs. SMP
    ... >> How is memory contention maintained ... sharing the same cache. ... > the superscaler processor has multiple instructions in flight already ... > processor may also have speculative execution when conditional ...
    (linux.redhat)
  • Re: New ARM Cortex Microcontroller Product Family from STMicroelectronics
    ... at zero waitstates due to its interleaved flash controller design. ... Each flash access done by the memory controller ... 33 MHz memory will read instructions from the flash faster than the ...
    (comp.arch.embedded)
  • Re: Not enough parallelism in programming
    ... So 10 years ago when CPUs were at 200 MHz and main memory was ... synchronization events (and 300 instructions between synchronizations ... Hmm - but don't some cache line coherency protocols with more states ...
    (comp.arch)
  • Re: Parallelization on muli-CPU hardware?
    ... >> which lock memory for the duration of the operation. ... >processor may have the value in local cache memory. ... I don't know if smarter instructions are available now, ...
    (comp.lang.python)