Re: "Am I still working okay?" asked the micro controller...

From: Paul E. Bennett (
Date: 05/19/04

Date: Wed, 19 May 2004 18:22:19 +0100

Unbeliever wrote:

> "SelfTest" <SelfTEst> wrote in message
> news:40ab5b93$0$3034$
>> Say we have a micro controller with limited memory.
>> Say it will perform some realtime control of something.
>> How to make a SW for a micro controller, that in addition to its normal
>> operation (control of something), from time to time it will also check
>> itself if it is doing okay or not ? How a program can test itself? Can
>> some one suggest any intelligent method (other than watch dog) ?
> You are correct in identifying watchdog timers as one form of COP
> (computer
> operating properly test). Other things I've often used are:
> 1) Background checksum on code and constant/initializer areas of memory
> 2) Flags and timers which indicate that critical routines and
> interrupts are running at about the right rate, usually checked in the
> watchdog timer interrupt.
> 3) Guardwords between stacks and other memory and regular checks that
> these have not been compromised (agail often in the watchdog timer
> interrupt.
> 4) Feedback of critical output signals to ensure the hardware is
> working correctly (the hardware is much more likely to suffer random
> failures than the software).
> 5) A decent watchdog timer with an algorithmic stimulus and response
> (e.g. watchdog processor supplies a pseudorandom number and main processor
> replies with next pseudo-random number in a sequence). Much better than
> the primitive kick within a certain time style of watchdog, which is prone
> to failure to detect runaway software which includes a kick.
> 6) One I haven't used but seen used on a critical plc style system is
> an odd number of redundant processors (3 in this case) which vote on the
> state of an output (output follows the state of two agreeing inputs).

..and adding to that list. External Pulse Maintained relay. This device has
to be fed a change of polarity of its input signal at a regular rate in
order for it to maintain a relay in its energised state. If any single
component fails, the power supply goes off or the input does not change
then the relay just de-energises and opens its contacts. The pulse drive
for such a circuit should be driven from the processor internal sanity
checks that your software is performing (all check OK so change the state
of the output). This device can elevate a single processor from SIL0 to
SIL1 with very little effort.

Further, your microcontroller may be comunicating with other systems in
order to perform its control. Doing sanity checks on the communication link
and checking its integrity in operation will yield a good idea of
sub-system health. You will need checksums and/or CRC's on all messages
between systems.

Integral step-wise walking memory test and other walking sanity checks.
This can detect potential failure points quite early on.

There are a number of others.

> Of course, the next question you should ask is "What do I do when I detect
> a
> failure". If it is a safety critical system (e.g. the something you're
> controlling is a train, nuclear reactor or gas furnace rather than a lego
> windmill) there's a whole other set of questions you should ask even
> before asking the first one.

You should do an evaluation of what the system safe state is going to be
(off, bypassed or gracefully degrading). Then your design efforts should
always lean the system toward achieving those safe states unless it is
continuing to work properly.

Paul E. Bennett ....................<email://peb@a...>
Forth based HIDECS Consultancy .....<>
Mob: +44 (0)7811-639972 .........NOW AVAILABLE:- HIDECS COURSE......
Tel: +44 (0)1235-811095 .... see for details.
Going Forth Safely ..... EBA.