Re: Blue Chip Technology + MagnumX?



On Tuesday, in article <4kdgojFbg7edU1@xxxxxxxxxxxxxx>
dave@xxxxxxxxxxxxxxxxxxxx "David Hearn" wrote:

John Devereux wrote:
David Hearn <dave@xxxxxxxxxxxxxxxxxxxx> writes:
John Devereux wrote:
David Hearn <dave@xxxxxxxxxxxxxxxxxxxx> writes:
Has anyone got any experience of Blue Chip Technology
(http://www.bluechiptechnology.co.uk/) and especially their MagnumX
single board computers?
(http://www.bluechiptechnology.co.uk/single_board_computer.php?sub_group_id
=2)

We've been investigating the ColdFire MCF5475EVB for some quite simple
5 channel GPIO capture - but at quite high data rates. We've got the
266MHz ColdFire processor to do what we want but it just isn't quite
fast enough - it's missing some of the state transitions on the
channels. We're sampling at around 2 to 2.2MHz and we need more like
3.3MHz.

We were wondering whether a faster (2x?) processor would be more
likely to do the job. The 733MHz MagnumX board sounds like it might
do this, and it has 16 channels of GPIO (we only need 5 inputs at
present, unlikely to increase significantly). I'm aware it's moving
from the ColdFire/M68K processor to an x86 processor, but unless
there's a major difference in per cycle processing, that's not a
problem.

Our app is quite simple - loop until a counter expires, each iteration
store 1 byte pin state register and 32 bit counter value. We
currently can sample up to 60MB of data.
Can you not use DMA instead? And/Or perhaps an external FIFO.
Personally, I have no idea. I've never used DMA before (I'm
traditionally a desktop app guy!) - so I'm not sure where I would
start, or what I'd need to do. Essentially, we just want to move a 8
bit GPIO register containing pin status and a 32 bit register
containing a counter value into RAM as fast as possible - at around
3.3MHz (sample every 300ns).

Paul Carpenter beat me to to most of the points I would make. I would
just wonder whether you actually need the counter - if the samples are
being aquired reliably then perhaps the sample number can give you the
time?

Yes, if the samples are being taken at guaranteed periodic intervals
then use of the counter is not needed.

The software counter, only *guarantees* that a counter variable counts
it can not under all circumstances guarantee anything else.

Originally to save RAM (we 'only' have 64MB on board), we only logged
the pin state if it had changed, but this added a little processing into
the loop (thus making it slower), and also required the use of the
counter to get timing values as the time for each iteration would change
depending on whether the pin state had changed. When the pin state was
the same the loop was shorter, when it had changed it was longer.

Does the time factor between samples get logged as well as the changes?
If not the data starts to become meaningless.

What is the maximum frequency of change of ANY digital input compared
to desired sampling frequency?

To try and speed things up (ie. reduce memory accesses and
conditionals), we started just sampling the pin state, but not filtering
out any non-changes - therefore making the code path in each iteration
of the loop identical. This, in theory, should give the same sample
rate each time, once the 'fudge' factor (time taken for each loop
iteration) is calculated.

If you can guarantee that NO other actions could EVER occur on your
board, causing the micro or its support hardware to put in waits
and chnage your timing.

I believe this worked well - however the sample rate was still only
fractionally faster than 2MHz - not close to the (approx) 3.3MHz we'd like.

The average sample rate, over a period of time, we have no idea what accuracy
of capture timing you are trying to achieve.

At the rates of capture you are trying to achieve I would NEVER do it by
software polling.

Anyway, DMA uses special hardware facilities to move data around,
rather than doing it in software. Look it up in the data*** for your
coldfire chip. You should be able use the output of a timer to trigger
a DMA transfer from a PIO port to memory, at a programmed rate.

I'll look into that. The issue I found yesterday whilst looking at some
generic DMA stuff is that the amount of code to initiate a DMA transfer
is significant, much more than what is currently occurring in the loop.
Whilst the memory transfer might be faster, the actual processing to
initiate the transfer will be greater.

If you want to capture many megabytes then most DMA controllers are
capable of dealing with 32K and larger transfers fully under hardware
timed control (if DMA is timer triggered).

I suggest you do some reading up on DMA on ANY processor and the concepts
will become clear.

I guess this is what Paul meant by using a deep FIFO (I'm still
learning!). This can be used to buffer up the samples such that the DMA
transfer can be done on a larger block, rather than the 1 (or 5) bytes
I'm currently processing each time.

The large FIFO means that you know the time interval between EACH sample
point because external hardware times the writes to the FIFO. The reads
from the FIFO are subject to system software, hardware latencies and on
most DMA controllers the time interval to setup the controller to point
to the next memory block to use. The deeper the FIFO allows for longer
latencies (accumulative) and longer delays (next memory block setup time).
Also cover for sloppy code and other system events.

Am I correct in saying that using DMA to transfer 1 or 5 bytes (if
including a counter) at a time is likely to have more overhead than just
doing a normal array indexed write?

Comparing transfering a few bytes from memory to memory, or a SINGLE
transfer from I/O to/from memory, then YES.

Comparing controlled timing to LARGE amounts of data to/from I/O at
timed intervals, timer control (internal or external) and DMA wins
hands down over longer periods.

Judicious use of multiple DMA buffers, would also allow your software to
be 'compressing' the data from ONE buffer to change events to save, AT
THE SAME TIME as _hardware_ is filling up another buffer, The amount of
buffers and their size depends on the

System in use
Length of data run
Time to process one buffer (MAX time) by software.
Capture rate (MIN time to fill a buffer)
Other system activity
Other things I have forgotten.

We originally tried using a timer to generate interrupts and to sample
at a constant rate, however, we found that the sampling speed was far
slower than using a simple while() loop - too slow for our needs. To

Probably too much code in the interupt routine and interupt latencies.
The timer should be DIRECTLY strobing the data capture.

further complicate matters, turning full optimisation on (using gcc for
M68k) caused interrupts to break. Using full optimisation made a
noticeable difference in speed when doing the simple while() loop.

Sounds more likely to be improper use of things like volatile and other
software design problems.

Unfortunately I have little control over modifications of the hardware
(ie. adding FIFOs etc). Whilst I can advise what might be a better off
the shelf choice, custom changes to boards is unlikely to happen at this
stage (proof of concept and initial development).

Well if you don't sort out the data acquisition properly BEFORE deciding
on processor, it is akin to saying

"I have this wonderful MP3 player which needs an Edison Wax Cylinder
interface via USB without designing the hardware to interface it."

Proof of concepts have always in my opinion have to show what is and
what is NOT needed. If you approach the problem with the wrong tool
then you will need to add something to kludge the lot together.

All you are proving at the moment is that we can throw money at buying
bits and any bits, that if we keep buying bits we will force the problem
to be solved.

Work out what needs to be done how and under what constraints first, then
find the necessary hardware and software. You _APPEAR_ to have the hardware
and software which you are trying to force into meeting unknown constraints.

........
I'm certainly no true embedded engineer - I'm traditionally a PC
software engineer who hadn't had to worry about low level code before.
In the (small) company, I probably have the most 'embedded' experience
through doing some set top box software development (using provided SDKs
not requiring low level hardware interfacing). This has been my first
true embedded development project, and had a steep learning curve but
has been fun. My knowledge of architectures and embedded hardware is
very limited. The Coldfire board was selected by someone else (who's
moved on now!) and if I can find a more suitable arhitecture/board then
that would be good. I'm not in a position to design any hardware really
- just select something that's already out there.

Well you need to find some form of CLOCKED digital input to some form
of DMA transfer to some processor. They exist but mainly for PCs and the
like, not many of them. Long time since I looked for such things. They
are also normally relatively expensive items.

I have no experience of selecting, programming or interfacing with
FPGAs, but I have had it suggested before that something like this might
be better suited to this task. The problem is, I'm not a hardware
design guy.

It seems to me that the system designer or whoever is in charge of the
overall system design needs to review the 'proof of concept' strategy.

I've even had a suggestion along the lines of "just use a memory
controller chip and use a clock to transfer the pin status into the RAM
directly" (or something like that). Sounds a great idea - but that's
not something I can do with my current skills.

Someone where you work must be able to do it.

I appreciate all your help - but really, at the moment, we're limited to
using an off-the-shelf processor board - it's just knowing which is
suitable. We're 2/3 of the speed we need at present - so not a huge
amount to make up - it feels as if we should be able to do it using our
current 'design' - just wondering whether other hardware would have the
speed increase we're looking for.

Other hardware added to your board or even much lesser processor cards
will easily do this FUNCTION. Other hardware will do something your
hardware will not do, guarantee the sample to sample timing ensuring
that it stays that way is moving the data as fast as possible into
memory to keep up with sampling regime.

You appear not to have done any capture timing tests to see how regular
your digital capture is from sample to sample and whether it meets any
part of your spec.

All you have at the moment shown to me is that you have a bunch of data
acquired at AVERAGE clock rates.

Having spent many years dealing with all sorts of data acquisition and
processing upto and including high speed Video data and down to slow
heat measurements (and slower ones). This seems to me to be data acquisition
of data with no reference to make the data mean anything useful.

I suggest that you get someone involved who can see the WHOLE system issue
and is ABLE to if necessary design the correct hardware to support the
functions required.

Much as I could be able to do such a thing I have several projects currently
on the go from network infrastructure to ASIC testing. I am sure there
are plenty of people out there or in your organisation that could do this.

--
Paul Carpenter | paul@xxxxxxxxxxxxxxxxxxxxxxxxxxx
<http://www.pcserviceselectronics.co.uk/> PC Services
<http://www.gnuh8.org.uk/> GNU H8 & mailing list info
<http://www.badweb.org.uk/> For those web sites you hate

.