Re: What application requires 500MHz for embedded processors
- From: paul$@pcserviceselectronics.co.uk (Paul Carpenter)
- Date: Tue, 14 Mar 2006 11:26:45 +0000 (GMT)
On Tuesday, in article
<4jwRf.10311$ZJ2.4094@xxxxxxxxxxxxxxxxxxxx>
Wilco_dot_Dijkstra@xxxxxxxxxxxx "Wilco Dijkstra" wrote:
"Didi" <dp@xxxxxxxxxxx> wrote in message
news:1142132669.823913.252540@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
There is no need to do 3 independent accesses per cycle. This is a
very inefficient way of increasing bandwidth and that is why modern
CPUs increase the width of buses instead.
This tells me you have never actually done any DSP programming.
Please correct me if I am wrong (I certainly mean no offence).
You're wrong. For example I've written a highly optimised JPEG
(de)compressor on ARM using software SIMD techniques.
Depends on application constraints.
With a 64-bit bus you can read 4 16-bit values per cycle, every cycle.
This is clearly faster than reading 16-bits from 3 independent address
per cycle, right?
No. Every 16 bit value has a separate address, which is - in the case
of the 5420 - another 16 bits. I will not go into explanation why this
is so, I guess there are sufficient books on digital signal processing
around.
I know why low and mid-end DSPs do this, however there are major
limitations with this approach. Alternatives exist which do not have these
limitations, and general purpose CPUs use these to improve DSP
performance without needing the traditional features of a DSP.
My point is that these alternatives allow modern general purpose
CPUs to easily beat traditional DSPs.
Not for some applications.
There is no need to do several independent accesses per cycle as
long as you've got enough bandwidth. 4 16-bit accesses every 10ns
is only 800MBytes/s. Just the data bandwidth between the core and L1
is 4GBytes/s on a 500Mhz ARM11 for example.
Here we go again, you don't want to believe DSPs have been
designed as they are because of necessity.
It's not necessity, more a particular design approach (like RISC/CISC).
It works fine at the low end, but it is simply not scalable. If you use it
like a dogma then you'll crash and burn, just like CPUs that were too
CISCy or RISCy...
Always forcing all data through a processor can for some applications cause
problems.
........
On ARM11 this computes 8 taps per iteration of 4 outputs (32 MACs)
in 24 cycles. In terms of bandwidth, it only does 6 loads every 32 MACs
(0.2 loads per MAC or 0.25 loads per cycle). So a 100Mhz ARM11
easily outperforms the 5420 at the same frequency.
FIR filters are clearly MAC rather than bandwidth bound. If we could
do 4 MACs per cycle, the loop would go faster. Now why do you insist
that you need at least 3 loads per MAC?
Having done various work with real time video, whereby the video must have
minimal delay and NO non-deterministic delays or stops, (i.e. continuous
operation), often because of other limitations of the system (broadcast
effects, mixing, scaling or equipment in loops with eye/hand co-ordination).
There are times where you have to have dedicated hardware as every pixel on
multiple video streams at the same time are undergoing 24 multiply and 9
adds at pixel rate. Having done standards conversion and rescaling from
input to output in less than 15 input TV lines delay, most of the delay
was changing the start times for active video due to blanking differences.
Often in these types of applications, the blockiness and delays of frame
delays can screw things up as all the delays add up.
There are times when the delay does not matter, still images, or open loop
methodology (e.g. set-top boxes, DVD players, audio players), but others
where the closed loop nature of the WHOLE system means DSP or fast processor
will not cut it.
Horses for courses, and various other reasons (often internal politics).
--
Paul Carpenter | paul@xxxxxxxxxxxxxxxxxxxxxxxxxxx
<http://www.pcserviceselectronics.co.uk/> PC Services
<http://www.gnuh8.org.uk/> GNU H8 & mailing list info
<http://www.badweb.org.uk/> For those web sites you hate
.
- References:
- What application requires 500MHz for embedded processors
- From: jade
- Re: What application requires 500MHz for embedded processors
- From: Wilco Dijkstra
- Re: What application requires 500MHz for embedded processors
- From: jade
- Re: What application requires 500MHz for embedded processors
- From: Wilco Dijkstra
- Re: What application requires 500MHz for embedded processors
- From: Didi
- Re: What application requires 500MHz for embedded processors
- From: Wilco Dijkstra
- Re: What application requires 500MHz for embedded processors
- From: Didi
- Re: What application requires 500MHz for embedded processors
- From: Wilco Dijkstra
- Re: What application requires 500MHz for embedded processors
- From: Didi
- Re: What application requires 500MHz for embedded processors
- From: Wilco Dijkstra
- Re: What application requires 500MHz for embedded processors
- From: Didi
- Re: What application requires 500MHz for embedded processors
- From: Wilco Dijkstra
- Re: What application requires 500MHz for embedded processors
- From: Didi
- Re: What application requires 500MHz for embedded processors
- From: Wilco Dijkstra
- What application requires 500MHz for embedded processors
- Prev by Date: Re: Are 4000 interrupts per second a problem for AVR?
- Next by Date: Best database for flashdisk
- Previous by thread: Re: What application requires 500MHz for embedded processors
- Next by thread: Re: What application requires 500MHz for embedded processors
- Index(es):
Relevant Pages
|