Re: Atmel releasing FLASH AVR32 ?
- From: "Wilco Dijkstra" <Wilco_dot_Dijkstra@xxxxxxxxxxxx>
- Date: Wed, 21 Mar 2007 21:26:32 GMT
"Ulf Samuelsson" <ulf@xxxxxxxxxxxxx> wrote in message news:etqnn5$v66$1@xxxxxxxxxxx
"Wilco Dijkstra" <Wilco_dot_Dijkstra@xxxxxxxxxxxx> skrev i meddelandet
news:2y_Lh.16902$NK3.2627@xxxxxxxxxxxxxxxxxxxxxxx
"Ulf Samuelsson" <ulf@xxxxxxxxxxxxx> wrote in message news:etp769$te9$1@xxxxxxxxxxx
That's true, but function calls are common too and they would typically
branch between pages. And then you have the nasty case of a function
or a loop split between 2 pages...
Fixed by compiler pragma...
Easy to say, a bit harder in reality. If you don't care about codesize you could
align big functions to 512-byte boundaries and pack small functions in the
gaps. But even that is hardly a solution as every minor change in the code
results in a different memory layout making performance unpredictable.
Basically it is an unsolveable problem.
On an ARM7, adding a cache also adds on waitstate to all non-cache accesses.
No, a cache doesn't impact other accesses to non-cacheable
memory areas. A local flash cache is something you could
just drop into an existing design without even worrying about
needing to turn it on or flush it. It's completely transparent.
Similarly, branch prediction makes a CPU go faster and so it burns less
power to do a given task. Cortex-M3 has a special branch prediction
scheme to improve performance when running from flash with wait
states, so it makes sense even in low-end CPUs.
Branch prediction cost is chasing an ever eluding target.
Branch prediction is pretty trivial as branches are very predictable.
A small global branch predictor (for example as used in the ARM1156)
gives an amazing good prediction at a neglegible hardware cost.
With multithreading you can swap in a computable process and use EVERY cycle.
So what? There are few wasted cycles on modern embedded CPUs.
Only very high-end CPUs are waiting a lot for slow memory.
Multithreading is not relevant in the embedded space, it would add a lot of complexity
and die area for hardly any gain.
Yes it is, just look at a mobile phone, lots of ~20 MIPS CPUs handling
Bluetooth, WLAN, GPS etc , just because noone has designed
a proper multithreading for embedded.
No, phones are extremely integrated and usually have only one CPU,
one DSP and perhaps a micro controller in the flash card.
Hardware multithreading doesn't give much performance on a high
end CPU, and it gives almost no benefit on a low end one. Less than
10% of the memory bandwidth is unused in an ARM7, so running a
second thread either means it runs at 10% of the maximum speed
or it slows down the main thread.
It really only makes sense on high-end
CPUs, but even there the gains are not that impressive.
If you believe that, you dont understand multithreading for embedded.
The purpose is not to increase performance, it is to improve real time
response so you do not have to have multiple CPUs.
You don't understand multithreading at all. Interrupt latency is completely
unaffected by multithreading. Whether you run 2 interrupts in parallel at
half the speed or one after the other at full speed is irrelevant.
You confuse multiprocessing with multithreading. A 2-core CPU can
indeed deal with 2 interrupts in parallel at full speed.
Adding more
cachelines evens this effect out, making performance more predictable.
No, your unpredictability comes from jumping to a place
and instead of accessing memory, to fetch the page
you have a cache hit, and then your timing is screwed.
It is impossible to run code at a predictable speed, so you're
screwed no matter whether you use a cache or not.
A cache can even reduce worst case performance since it
can introduce delays in the critical path.
So would a page cache. That is the price you have to pay when
improving performance: the best case is better but the worst
case is typically worse. Overall it is a huge win.
No it is not a win if you have to guarantee that a job completes
in a certain time.
Wrong. Code is highly repetitive, so even if you assume the cache
is invalidated at the start of a task, using a cache results in much
faster execution.
The cache in itself draws power, and you cannot compare
accesses to cache compared to accesses to flash memory.
Of course the cache burns power, but you're not using the flash.
Which uses less power is highly dependent on their size and
implementation. From what I've heard, caches are extremely
efficient for sequential accesses - ie. code accesses.
You have to run the cached CPU at a higher clock frequency to compensate
for loss of worst case performance.
No, it would be virtually impossible to find code that actually can't
meet its deadline with a cache.
Wilco
.
- Follow-Ups:
- Re: Atmel releasing FLASH AVR32 ?
- From: Jim Granville
- Re: Atmel releasing FLASH AVR32 ?
- From: Ulf Samuelsson
- Re: Atmel releasing FLASH AVR32 ?
- References:
- Atmel releasing FLASH AVR32 ?
- From: -jg
- Re: Atmel releasing FLASH AVR32 ?
- From: tesla
- Re: Atmel releasing FLASH AVR32 ?
- From: Ulf Samuelsson
- Re: Atmel releasing FLASH AVR32 ?
- From: Wilco Dijkstra
- Re: Atmel releasing FLASH AVR32 ?
- From: Ulf Samuelsson
- Re: Atmel releasing FLASH AVR32 ?
- From: Wilco Dijkstra
- Re: Atmel releasing FLASH AVR32 ?
- From: Ulf Samuelsson
- Atmel releasing FLASH AVR32 ?
- Prev by Date: Re: Atmel releasing FLASH AVR32 ?
- Next by Date: Re: Bit bang speed on AT91SAM9261 ARM processor
- Previous by thread: Re: Atmel releasing FLASH AVR32 ?
- Next by thread: Re: Atmel releasing FLASH AVR32 ?
- Index(es):
Relevant Pages
|