Re: New ARM Cortex Microcontroller Product Family from STMicroelectronics
- From: "Ulf Samuelsson" <ulf@xxxxxxxxxxxxx>
- Date: Sun, 8 Jul 2007 00:44:12 +0200
"rickman" <gnuarm@xxxxxxxxx> skrev i meddelandet
news:1183843932.123792.195400@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
On Jul 6, 10:51 am, "Ulf Samuelsson" <u...@xxxxxxxxxxxxx> wrote:
The FIFO is implemented using Flip-Flops and you had a
simple three stage pipeline (fetch, decode,execute) so
your latency was not dramatic.
That is not the point. By prefetching the instructions, you are
setting up for a bigger dump and subsequent loss of instruction memory
bandwidth when you branch. FIFOs or instruction prefetching are not a
perfect solution. It is much better to just have single cycle
memory.
Actually it is not, because if you try to decode your instruction
in the same stage as the decoding, your clock frequency will
go down significantly.
The prefetching will work with single cycle memory and with
memory having waitstates.
Prefetching, decoding and execution, all will take one clock.
If you execute at 66 MHz with a three stage pipeline
then you probably will execute around ~40 MHz with
a two stage pipeline (Just a guess).
If you execute blocks of 5 instruction including one jump,
each block will use 7 cycles (3 + 1 + 1 + 1 + 1) @ 66 Mhz
in a three stage pipeline for ~ 10 blocks / us.
In a two stage pipeline, you could use 2 clocks for a jump
so you execute (2 + 1 + 1 + 1 + 1) @ 40 MHz
which is 6,5 blocks / us, clearly slower.
Yes, but if the jumps are probably only 10-20% of all instructionsIf you have one waitstate, you will see that the bandwidth is still
high
so you lose only between 10-20% of the performance instead of 50%.
The AVR32 loses less than 10% in average.
But you are comparing apples and oranges. A processor that has no
wait states doesn't have to deal with this no matter what the
instruction mix is. It is just much simpler to not have to consider
memory latencies.
A processor running from flash without waitstates will be limited
in performance by the memory.
A processor which reads multiple instructions with waitstate
will be able to execute faster due to its higher bandwidth to memory.
I have run the SAM7 at 48 MHz, zero waitstate. Does not work over the
full
temp range though.
The AVR32 will support 1.2 MIPS/MHz @ 1 waitstate operation @ 66 MHz
due to its 33 MHz 2 way interleaved flash memory.
(1st access after jump is two clocks, subsucquent accesses are 1
clock)
How does that compare to the Cortex M3 running at 50 MHz with no
waitstates and no branch penalty?
The UC3000 is claimed as 80 MIPS at 66 MHz.
For the Cortex M3 to reach 80 MIPS at 50 MHz,
you have to have 80/50 = 1,6 MIPS per MHz.
I think that ARM does not claim that the Cortex is close to 1,6 MIPS per
MHz.
Oh, this is marketing stuff. I thought you might have run some real
benchmarks or someone else at Atmel might have.
They have run benchmarks on the AVR32, but I think people are relying
on official figures for the Cortex.
Certainly they have
looked hard at the Cortex. But if it competes too well against the
AVR32, I can see why it would not be pushed at Atmel.
Certainly there
will be a lot of sockets that will be won by an ARM device over a sole
source part like the AVR32.
And hopefully ARM device from Atmel :-)
At this point I don't think anyone can
say whether the AVR32 has legs and will be around in 5 years. It has
been out for what, a year or so?
Fortunately there are plenty of sockets around, and some will go AVR32.
The AVR32 is decidedly better on DSP algorithms due to its
single cycle MAC and also it has faster access to SRAM.
Reading internal SRAM is a one clock cycle operation on the AVR32.
Bit banging will be one of the strengths of the UC3000.
Isn't reading internal SRAM a single cycle on *all* processors? I
can't think of any that require wait states. In fact, most processors
try to cram as much SRAM onto the chip as possible because it is so
fast. Did you say what you meant to say?
On the UC3000 family, loading from internal SRAM will take one clock
in the execution stage.
Using single cycle SRAM does not mean that the load instruction is 1 clock.
--
Best Regards,
Ulf Samuelsson
This is intended to be my personal opinion which may,
or may not be shared by my employer Atmel Nordic AB
.
- Follow-Ups:
- References:
- Re: New ARM Cortex Microcontroller Product Family from STMicroelectronics
- From: rickman
- Re: New ARM Cortex Microcontroller Product Family from STMicroelectronics
- From: Ulf Samuelsson
- Re: New ARM Cortex Microcontroller Product Family from STMicroelectronics
- From: rickman
- Re: New ARM Cortex Microcontroller Product Family from STMicroelectronics
- From: Ulf Samuelsson
- Re: New ARM Cortex Microcontroller Product Family from STMicroelectronics
- From: rickman
- Re: New ARM Cortex Microcontroller Product Family from STMicroelectronics
- Prev by Date: Re: New ARM Cortex Microcontroller Product Family from STMicroelectronics
- Next by Date: Re: New ARM Cortex Microcontroller Product Family from STMicroelectronics
- Previous by thread: Re: New ARM Cortex Microcontroller Product Family from STMicroelectronics
- Next by thread: Re: New ARM Cortex Microcontroller Product Family from STMicroelectronics
- Index(es):
Relevant Pages
|