Re: New ARM Cortex Microcontroller Product Family from STMicroelectronics



"rickman" <gnuarm@xxxxxxxxx> skrev i meddelandet
news:1183651600.912254.284760@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
On Jun 24, 11:45 am, wilco.dijks...@xxxxxxxxxxxx wrote:
On 23 Jun, 03:10, rickman <gnu...@xxxxxxxxx> wrote:
I don't follow what you are saying at all. Branch prediction relates
to pipelining. I don't see how it relates to wait states.

Adding a wait state is the same as increasing the pipeline depth, and
branch
prediction coupled with prefetching can hide some of that latency.

I don't see how that is true at all. When you add a waitstate you
freeze all stages of the pipeline while you wait for the Flash to
finish the access.


I don't know exactly how the Cortex work, but I worked on the internals
of another 32 bit RISC core.
This core had a 16 byte FIFO in the first pipeline stage.
The prefetch mechanism loaded 32 bits into this FIFO each access.
The memory controller could add waitstates to this access if neccessary.

The first pipeline stage did a simple decoding of the top halfword of the
FIFO
to determine the length of the instruction (1-3 halfwords) and if the FIFO
had enough valid content, the full instruction was made available
to the second decoding stage, otherwise a "not valid" signal was asserted.

The second stage would either execute the instruction, reading 1-3 halfwords
from the FIFO, or if the "not valid" was asserted, the second stage would
execute a NOP instruction.

Since most instructions are 16 bits, and you read 32 bits at a time,
zero waitstate operation allows to fetch almost two instructions per cycle.
The FIFO will quite soon be filled and if the odd 32/48 bit instruction pops
up,
it wont hurt your performance.

If you have one waitstate, you will see that the bandwidth is still high
enough that 1MIPS/MHz can be maintained as long as you only
execute 16 bit instructions. You will be hurt by fetching a 32 bit
instruction
since that takes 2 clocks.


I have run the SAM7 at 48 MHz, zero waitstate. Does not work over the full
temp range though.
The AVR32 will support 1.2 MIPS/MHz @ 1 waitstate operation @ 66 MHz
due to its 33 MHz 2 way interleaved flash memory.
(1st access after jump is two clocks, subsucquent accesses are 1 clock)


--
Best Regards,
Ulf Samuelsson
This is intended to be my personal opinion which may,
or may not be shared by my employer Atmel Nordic AB


.



Relevant Pages

  • Re: Optimization! Where?
    ... By your description I believe you are trying to say that each instruction ... you might find it baffling that the Xelerator X10q has a pipeline ... u-ops can be on any stage. ... I assume that you're referring to the Pentium Pro (from the article I ...
    (microsoft.public.vc.language)
  • Re: New ARM Cortex Microcontroller Product Family from STMicroelectronics
    ... at zero waitstates due to its interleaved flash controller design. ... Each flash access done by the memory controller ... while eating from the instruction queue wihout any performance penalty. ... but that does not require a FIFO. ...
    (comp.arch.embedded)
  • Re: Optimization! Where?
    ... "Intel Core" architecture is what's used in Intel Core Duo, ... You misrepresent a concept of command pipeline here. ... An instruction has to ... I assume that you're referring to the Pentium Pro (from the article I ...
    (microsoft.public.vc.language)
  • Re: Scheduling on Cortex
    ... It is telling for example that the pipeline ... You'd hope that this would issue one instruction to each pipeline, ... so you do indeed get a 2 cycle stall. ... There are a number of examples in the Cortex-A8 TRM where an unconditional ...
    (comp.sys.arm)
  • Re: Why disabling/enabling interrupts is expensive operation?
    ... > currently in the pipeline may raise. ... > a) make the complete pipelinecli-aware so that cli can complete out ... The affected instruction need only be required to take ... to write their values back to the processor before they execute. ...
    (comp.os.linux.development.system)