Re: Program "close to the machine"
- From: Greg Menke <gusenet@xxxxxxxxxxx>
- Date: Mon, 11 Aug 2008 21:19:25 -0400
jaycx2.3.calrobert@xxxxxxxxxxxxxxxxxxxxxx (Robert Maas, http://tinyurl.com/uh3t) writes:
From: George Neuner <gneun...@xxxxxxxxxxx>
Very few researchers are working on "hard" RT where unplanned
delays can cause failures and cannot be tolerated at all. GC time
in an HRT system must be tightly bounded, may have to be completely
accounted for, and definitely has to be factored into the design.
Are there any *absolutly-for-all-time* HRT systems? I imagine most
systems would work fine in practice if we always had at least 20
minutes warning before we ran out of memory. For example:
- Nuclear power plant: Call supervisor, have supervisor order
station taken off the power grid, then drop in the control rods
to slow the reaction to where the DUMB system can manage things
just fine, then go into system logs to see which application is
consuming memory but not returning it, and if it's critical the
restart the system, or if non-critical then just disable that
application until more memory is available.
- Jetliner in flight: Order pilot to immediately divert out from
the storm, then put plane on autopilot, then re-boot the
computer.
- Space shuttle coming in for a landing: Not to worry you'll be on
the ground at Edwards AFB already before your computer runs out
of memory.
- MAD (Mutual Assured Destruction) during actual attack: All your
facilities will be vaporized in less than 20 minutes, before you
might run out of memory, so out-of-memory condition isn't going
to be a problem!!
HRT is all about deadlines- 20 minutes is about the same as 20
milliseconds, if you can't make the processing deadline then you're
sunk. So for the gc not to mess up the realtime processing then the
algorithm has to guarantee that it won't delay the app layer by "x
seconds" where x generally specifies the "hardness". 1 millisecond on
some given target OS on some target hardware might be hard enough
realtime for some apps. If the gc algorithm can't make that but can
make 50 ms, then its use would generally be relegated to "softer" apps.
You can test the conformance by plotting the latency jitter of the app
wrt its deadlines over time.
Specifications about the worst-case latency that a gc algorithm incurs
feeds into the risk analysis which goes into the systems engineering.
Once (and if) the risks are evaluated then you can talk about safety
margins. You can't just assume that a Sufficiently Smart, Well-Informed
and Equipped Operator is going to intervene- or that the operator will
even be alive to hit the reset button. Taking the ops computer offline
during some maneuver isn't going to cut it- what are you going to do
with the accumulated ops state, control loops, etc? Can't just magic it
out of nowhere with all the errors factored out when the computer comes
back online.
How do you guarantee the 20 minute limit? Slope of the memory
allocation curve- whats the sampling rate? What happens if the
allocation rate has a big spike at the worst possible moment due to
unforseen circumstances, taking you from 25% free to 1% with 10 +/- 30
seconds left before the app dies because of no memory?
There are several commandments levied where I work, top of the list is
"Thou Shalt Not Dynamically Allocate Memory". Which means no allocation
and so no leaks. The position is sometimes moderated to allow
start-time allocation of memory which is then incorporated into linked
lists and kept there. Faced with a Lisp solution I would probably make
the core critical safe-keeping functions in plain old C (or C++ w/ no
'new'), and put Lisp off the realtime path. I tend to like that
approach because C/C++ is miserable from a high-level perspective and
I'd like to have Common Lisp for that stuff. Off the time critical
path, CL can do its thing with some latitude and its considerable
leverage does some good.
I believe that a reference-count system for live collection (with
each allocation unit tagged as to which application asked for it to
be allocated), and a mark-and-sweep collector running in background
to reclaim anything that has reference loops and hence "leaks"
through the reference-count system, plus something that monitors
memory to see if "memory leaks" are running too far ahead of M+S
GC, in which case the 20-minute warning would be issued, plus a
"dead man's alarm" system to make sure the memory-monitor is still
actively running, should suffice.
All fine for a "regular" app where failure means the print job might be
delayed a bit, but it amounts to a huge non-linear and non-reversable
increase in risk (because the gc latency can hit almost any time no
matter what the warnings say) at some ill-defined point in the app's
execution path.
Gregm
.
- References:
- Re: Program "close to the machine"
- From: Vend
- Re: Program "close to the machine"
- From: George Neuner
- Re: Program "close to the machine"
- From: Dimiter \"malkia\" Stanev
- Re: Program "close to the machine"
- From: George Neuner
- Re: Program "close to the machine"
- From: Vend
- Re: Program "close to the machine"
- From: George Neuner
- Re: Program "close to the machine"
- From: Robert Maas, http://tinyurl.com/uh3t
- Re: Program "close to the machine"
- Prev by Date: Re: Program "close to the machine"
- Next by Date: Re: Program "close to the machine"
- Previous by thread: Re: Program "close to the machine"
- Next by thread: Re: Program "close to the machine"
- Index(es):
Relevant Pages
|