Re: New ARM Cortex Microcontroller Product Family from STMicroelectronics



Nope it isn't, the AVR32 running at 66 MHz will run mostly
at zero waitstates due to its interleaved flash controller design.
Each flash access done by the memory controller
will have 1 waitstate, but since the memory controller can do
two accesses in parallel, the CPU will only see waitstates
during jumps, and no waitstates during non jump instructions.
If you do jumps 20% of the time, then the average number of waitstates is
0,2.
On top of that you will be able to perform dataaccesses to the flash
while eating from the instruction queue wihout any performance penalty.

That is pointless. It does not matter how large the FIFO is, if you
are pulling data out at a given rate and you can only put data in at
that same rate, as soon as you have to stop instruction reads to do a
data read, you will not be filling the FIFO as fast as it is being
emptied and performance will suffer. Run through a simulation and see
if that is not true. Based on the info you provided, this is the
result.

The AVR32 uC3000 series running at 66 MHz from a dual bank
33 MHz memory will read instructions from the flash faster than the
execution
unit consumes the instruction in the queue.
Best case is if we assume that the CPU is only executing 16 bit instructions
and fetching 32 bit instructions. Then the CPU fetches (except for the 1st
cycle
where you get hit by the waitstate) 2 instructions per cycle.

The ARM9 will fit almost any sockets where the user require an external
bus.

So you are agreeing with me that the ARM9 is not a good match for most
ARM7 or CM3 designs? The ARM9 may "fit" the design, but it will not
be as good a fit if the ARM7 or CM3 can do the job. If nothing else,
the cost and power consumption will be higher with the ARM9. In most
cases the package size will be larger for the ARM9. Why use a shotgun
when a slingshot will do the job?


No, a large part of the ARM designs I see (~50%) , need an external bus
and then the ARM9 is likely to be a better choice than an ARM7.
If we look at volume, then the volume of ARM9 is higher than the volume
of ARM7 in my project list, since we have a large portion of Embedded Linux.

What processor only uses 3 clock instructions to access 1 clock
memory? My understanding is that many processors not only use faster
instructions to load, but can use memory in other instructions which
allow single cycle back to back memory accesses.

The simple three stage pipeline processors (and the CM3) normally use a
few
clocks
in the execution stage to load data, but the uC3 family does not.

Ok, I have to assume that you don't have any examples. Regardless,
this seems like a red herring in this discussion anyway.


There are plenty of examples.
I would be surprised if you found a lot of processors where loads/stores
execute in one clock cycle.


--
Best Regards,
Ulf Samuelsson
This is intended to be my personal opinion which may,
or may not be shared by my employer Atmel Nordic AB


.



Relevant Pages

  • Re: Atmel releasing FLASH AVR32 ?
    ... There is no point if you branch every few instructions as most programs do... ... and noone has been able to tell me why this is not possible with flash. ... a cache with 1 line of 512 bytes is totally useless. ... Most memory is optimised for density, not ...
    (comp.arch.embedded)
  • Re: Superstitious learning in Computer Architecture
    ... Without a LOT of logic or some other better approach, re-executing the instructions requires re-decoding and it ties up the cache memory bus transferring more data as instructions than the instructions are working on. ... There is most of an order of magnitude in speed sacrificed by even HAVING a cache in a single ALU system, and more than an order of magnitude in multiple-ALU systems! ...
    (comp.arch.arithmetic)
  • Re: Iyonix instruction timings and RAM speed results
    ... I get 166/127 for main RAM and 5.7/62 for PCI Video memory on my original ... unrolled LDM/STM instructions for these, ... "add floating point" instruction takes almost 200 clock cycles, ... MOV R0,R0,LSL #1 ...
    (comp.sys.acorn.misc)
  • PART 3. Why it seems difficult to make an OOO VAX competitive (really long)
    ... implementations, compared to high-performance RISCs, but also to IA-32. ... "The VAX is so tied to microcode we predict it will be impossible to ... most frequently-used instructions can be converted to a small number ... given the increasing relative latency to memory. ...
    (comp.arch)
  • Re: Cost of calling a standard library function
    ... > sense, since push Allocates memory, and pop deallocates it. ... Hence, all the CPU does is, basically: ... so forth...it's even possible to get "free" instructions (effectively ... what else is an ASM coder's job? ...
    (alt.lang.asm)