Re: New ARM Cortex Microcontroller Product Family from STMicroelectronics
- From: rickman <gnuarm@xxxxxxxxx>
- Date: Mon, 16 Jul 2007 07:04:28 -0700
On Jul 14, 4:04 am, "Ulf Samuelsson" <u...@xxxxxxxxxxxxx> wrote:
"rickman" <gnu...@xxxxxxxxx> skrev i meddelandetnews:1183995592.678499.34860@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Ulf Samuelsson wrote:
"rickman" <gnu...@xxxxxxxxx> skrev i meddelandet
That is not the point. By prefetching the instructions, you are
setting up for a bigger dump and subsequent loss of instruction memory
bandwidth when you branch. FIFOs or instruction prefetching are not a
perfect solution. It is much better to just have single cycle
memory.
Actually it is not, because if you try to decode your instruction
in the same stage as the decoding, your clock frequency will
go down significantly.
The prefetching will work with single cycle memory and with
memory having waitstates.
What are you talking about??? How is slow memory faster than fast
memory???
If you have a memory capable of running at 50 MHz and you
put that in a CPU capable of running at 25 MHz, then you
will run slower.
In a two stage pipeline, you do "fetch-decode" and "execute".
If memory access, decoding and execution takes 20 ns,
then it will take 20 + 20 = 40 ns to handle the "fetch-decode" stage,
so the CPU can run at 25 MHz.
In a three stage pipeline, you do "fetch", "decode", "execute".
If all three stages take 20 ns, then you will be able to run at 50 MHz.
This conversation has become pointless. It started discussing the
loss of performance in processors that use slow Flash memory and you
have turned it into a discussion of processor design. You are way off
topic and your comments are irrelevant to the original point. The
bottom line is that if all other things are equal, a processor with
faster Flash memory will run faster. The Stellaris CM3 running at 50
MHz with no wait states from Flash will be faster for most apps than a
processor running at 70 MHz with 1 or two wait states like the STM
parts we were discussing. It may also be faster in many apps than a
processor running at 70 MHz using a wide flash bus interface to
overcome the wait states required because the lookahead fetch is often
wasted when the instruction flow changes.
You can dance around that, but those are the facts.
Prefetching, decoding and execution, all will take one clock.
If you execute at 66 MHz with a three stage pipeline
then you probably will execute around ~40 MHz with
a two stage pipeline (Just a guess).
If you execute blocks of 5 instruction including one jump,
each block will use 7 cycles (3 + 1 + 1 + 1 + 1) @ 66 Mhz
in a three stage pipeline for ~ 10 blocks / us.
In a two stage pipeline, you could use 2 clocks for a jump
so you execute (2 + 1 + 1 + 1 + 1) @ 40 MHz
which is 6,5 blocks / us, clearly slower.
Since when do I get to design my own processor??? Everything you have
just written is based on your own assumptions. This is a pointless
discussion since everything you say is based on *your* assumptions!
In addition, you only consider the parts of the issue that you choose
to include. You did a timing analysis on paper that does not include
the effect of branches. Clearly not accurate regardless of your
assumptions!
Statistics is likely to show that branches are normally not that frequent
that you
gain speed by having a shorter pipeline.
Funny, you are bringing in both statistics *and* probability. That is
the type of language I hear all the time in commercials where they
want you to think they have just told you a fact when in fact they
have said pretty close to nothing.
But you are comparing apples and oranges. A processor that has no
wait states doesn't have to deal with this no matter what the
instruction mix is. It is just much simpler to not have to consider
memory latencies.
A processor running from flash without wait states will be limited
in performance by the memory.
A processor which reads multiple instructions with wait state
will be able to execute faster due to its higher bandwidth to memory.
Again you are assuming facts that are not in evidence. Where do you
get the higher bandwidth from memory if it is running with wait
states? Oh, right, you are *assuming* that there is something
different in the design that will make that one faster. Something
that is not part of a slower Flash that requires wait states.
By making it wider.
The UC3000 is claimed as 80 MIPS at 66 MHz.
For the Cortex M3 to reach 80 MIPS at 50 MHz,
you have to have 80/50 = 1,6 MIPS per MHz.
I think that ARM does not claim that the Cortex is close to 1,6 MIPS
per
MHz.
Oh, this is marketing stuff. I thought you might have run some real
benchmarks or someone else at Atmel might have.
They have run benchmarks on the AVR32, but I think people are relying
on official figures for the Cortex.
"People" being "you"?
No, Atmel marketing.
Ahhh, *marketing*! That makes it very clear now. We can all have
complete trust in benchmark figures from *marketing*!
Certainly they have
looked hard at the Cortex. But if it competes too well against the
AVR32, I can see why it would not be pushed at Atmel.
Certainly there
will be a lot of sockets that will be won by an ARM device over a sole
source part like the AVR32.
And hopefully ARM device from Atmel :-)
There are a number of sockets that Atmel won't win if they don't have
a CM3 device. There are two companies with the new core in production
and a third on their heels. I am sure sales of the ARM7 devices won't
drop off a cliff. But this business is all about design wins and I
stand by my earlier post in another thread that the CM3 will start to
steal significant numbers of design wins by the end of this year and
by the end of next year they will overshadow the ARM7 design wins in
the off the shelf MCU market.
And maybe the ARM9 designs overshadows the ARM7 and CM3 as well.
I see most high volume designs nowadays require 200 MHz + operation.
The large customers (1M+) requiring low power, seems to focus
on 1,8V SAM7s or AVR32s.
This is of course only 5% of the total MCU market normally
so things could be different in your region.
Yes, the swan song of the truly desperate. If anyone connected to the
ARM7 feels threatened by the CM3, they simply bring in the ARM9 which
is a totally unsuited processor for most of the apps that the ARM7 and
CM3 target. The ARM9 will never fit the sockets that the ARM7 and CM3
fill. However, the CM3 fill most of those sockets much better than
the ARM7 and that is my point.
A company selecting a binary compatible family, will still be better off
with ARM
than with Cortex, due to larger performance span.
If they can shoe horn it onto their board! An ARM9 may be the right
choice for a router, but not for a controller. The CM3 is targeted to
the lower end bumping up against the 8 bit devices and eating into
their market segment. The ARM9 will never compete in that area. It
is too large of a chip and will always be uncompetitive at the low
end.
At this point I don't think anyone can
say whether the AVR32 has legs and will be around in 5 years. It has
been out for what, a year or so?
Fortunately there are plenty of sockets around, and some will go AVR32.
Is that the plan for the AVR32, to take *some* sockets? You know as
well as I do that if the AVR32 does not get significant market
penetration within a two years from now, it will be put on the back
burner and eventually discontinued. Atmel has no reason to keep making
a part that consumes significant resources and does not make
significant profit. Look at what happened to Atmel programmable
logic. When was the last time they added a new FPGA to the product
line? How many FPSLICs have been designed into new sockets?
I see you ignored this comment. There are any number of "good ideas"
that have totally failed in the market place. It is very possible
that the ARM32 will be one of them.
The AVR32 is decidedly better on DSP algorithms due to its
single cycle MAC and also it has faster access to SRAM.
Reading internal SRAM is a one clock cycle operation on the AVR32.
Bit banging will be one of the strengths of the UC3000.
Isn't reading internal SRAM a single cycle on *all* processors? I
can't think of any that require wait states. In fact, most processors
try to cram as much SRAM onto the chip as possible because it is so
fast. Did you say what you meant to say?
On the UC3000 family, loading from internal SRAM will take one clock
in the execution stage.
Using single cycle SRAM does not mean that the load instruction is 1
clock.
Like I said, aren't all internal SRAMs in all processors single
cycle???
Maybe so, but from a performance point of view, you are more
interested in how many cycles it takes to load from SRAM into a
register, and if this takes 1 clock cycle due to a 1 clock load
instruction, or 3 clock cycles due to a 3 clock load instruction
(from a 1 clock cycle SRAM), then you do see a performance differnence.
What processor only uses 3 clock instructions to access 1 clock
memory? My understanding is that many processors not only use faster
instructions to load, but can use memory in other instructions which
allow single cycle back to back memory accesses.
Besides, no one feature ever makes or breaks a processor chip. There
are literally dozens of distinguishing points between different
processors and only marketing and salesmen try to narrow an engineer's
focus to a small number of features. I care about the overall utility
of a processor and one of the big selling points to me is the
ubiquitousness of the ARM chips. Very soon that will include the CM3
devices which will take over the low end squeezing the ARM7 between
the CM3 and the ARM9.
.
- Follow-Ups:
- Re: New ARM Cortex Microcontroller Product Family from STMicroelectronics
- From: Ulf Samuelsson
- Re: New ARM Cortex Microcontroller Product Family from STMicroelectronics
- From: Jim Granville
- Re: New ARM Cortex Microcontroller Product Family from STMicroelectronics
- References:
- Re: New ARM Cortex Microcontroller Product Family from STMicroelectronics
- From: rickman
- Re: New ARM Cortex Microcontroller Product Family from STMicroelectronics
- From: Ulf Samuelsson
- Re: New ARM Cortex Microcontroller Product Family from STMicroelectronics
- From: rickman
- Re: New ARM Cortex Microcontroller Product Family from STMicroelectronics
- From: Ulf Samuelsson
- Re: New ARM Cortex Microcontroller Product Family from STMicroelectronics
- From: rickman
- Re: New ARM Cortex Microcontroller Product Family from STMicroelectronics
- From: Ulf Samuelsson
- Re: New ARM Cortex Microcontroller Product Family from STMicroelectronics
- From: rickman
- Re: New ARM Cortex Microcontroller Product Family from STMicroelectronics
- From: Ulf Samuelsson
- Re: New ARM Cortex Microcontroller Product Family from STMicroelectronics
- Prev by Date: Re: newbie question about embedded system project for satelliate communication
- Next by Date: Re: ADS 1.2 problem.
- Previous by thread: Re: New ARM Cortex Microcontroller Product Family from STMicroelectronics
- Next by thread: Re: New ARM Cortex Microcontroller Product Family from STMicroelectronics
- Index(es):
Relevant Pages
|