Re: Program "close to the machine"




jaycx2.3.calrobert@xxxxxxxxxxxxxxxxxxxxxx (Robert Maas, http://tinyurl.com/uh3t) writes:
From: George Neuner <gneun...@xxxxxxxxxxx>
Very few researchers are working on "hard" RT where unplanned
delays can cause failures and cannot be tolerated at all. GC time
in an HRT system must be tightly bounded, may have to be completely
accounted for, and definitely has to be factored into the design.

Are there any *absolutly-for-all-time* HRT systems? I imagine most
systems would work fine in practice if we always had at least 20
minutes warning before we ran out of memory. For example:
- Nuclear power plant: Call supervisor, have supervisor order
station taken off the power grid, then drop in the control rods
to slow the reaction to where the DUMB system can manage things
just fine, then go into system logs to see which application is
consuming memory but not returning it, and if it's critical the
restart the system, or if non-critical then just disable that
application until more memory is available.
- Jetliner in flight: Order pilot to immediately divert out from
the storm, then put plane on autopilot, then re-boot the
computer.
- Space shuttle coming in for a landing: Not to worry you'll be on
the ground at Edwards AFB already before your computer runs out
of memory.
- MAD (Mutual Assured Destruction) during actual attack: All your
facilities will be vaporized in less than 20 minutes, before you
might run out of memory, so out-of-memory condition isn't going
to be a problem!!


HRT is all about deadlines- 20 minutes is about the same as 20
milliseconds, if you can't make the processing deadline then you're
sunk. So for the gc not to mess up the realtime processing then the
algorithm has to guarantee that it won't delay the app layer by "x
seconds" where x generally specifies the "hardness". 1 millisecond on
some given target OS on some target hardware might be hard enough
realtime for some apps. If the gc algorithm can't make that but can
make 50 ms, then its use would generally be relegated to "softer" apps.
You can test the conformance by plotting the latency jitter of the app
wrt its deadlines over time.

Specifications about the worst-case latency that a gc algorithm incurs
feeds into the risk analysis which goes into the systems engineering.
Once (and if) the risks are evaluated then you can talk about safety
margins. You can't just assume that a Sufficiently Smart, Well-Informed
and Equipped Operator is going to intervene- or that the operator will
even be alive to hit the reset button. Taking the ops computer offline
during some maneuver isn't going to cut it- what are you going to do
with the accumulated ops state, control loops, etc? Can't just magic it
out of nowhere with all the errors factored out when the computer comes
back online.

How do you guarantee the 20 minute limit? Slope of the memory
allocation curve- whats the sampling rate? What happens if the
allocation rate has a big spike at the worst possible moment due to
unforseen circumstances, taking you from 25% free to 1% with 10 +/- 30
seconds left before the app dies because of no memory?

There are several commandments levied where I work, top of the list is
"Thou Shalt Not Dynamically Allocate Memory". Which means no allocation
and so no leaks. The position is sometimes moderated to allow
start-time allocation of memory which is then incorporated into linked
lists and kept there. Faced with a Lisp solution I would probably make
the core critical safe-keeping functions in plain old C (or C++ w/ no
'new'), and put Lisp off the realtime path. I tend to like that
approach because C/C++ is miserable from a high-level perspective and
I'd like to have Common Lisp for that stuff. Off the time critical
path, CL can do its thing with some latitude and its considerable
leverage does some good.


I believe that a reference-count system for live collection (with
each allocation unit tagged as to which application asked for it to
be allocated), and a mark-and-sweep collector running in background
to reclaim anything that has reference loops and hence "leaks"
through the reference-count system, plus something that monitors
memory to see if "memory leaks" are running too far ahead of M+S
GC, in which case the 20-minute warning would be issued, plus a
"dead man's alarm" system to make sure the memory-monitor is still
actively running, should suffice.

All fine for a "regular" app where failure means the print job might be
delayed a bit, but it amounts to a huge non-linear and non-reversable
increase in risk (because the gc latency can hit almost any time no
matter what the warnings say) at some ill-defined point in the app's
execution path.

Gregm
.



Relevant Pages

  • Re: When to check the return value of malloc
    ... code to handle the failure of every single allocation. ... most of the time the program will fail to get memory again ... meanwhile the app may be able to do other useful things. ... In other cases - such as the one I mentioned - an allocation failure is ...
    (comp.lang.c)
  • Re: How to release heap memory that is marked as free
    ... As I said, fragmentation is a very serious problem, and one of the most serious problems ... my allocator was accused of using massive amounts of memory. ... I'm going to have to re-think the memory allocation that I'm ... process's 'working set'. ...
    (microsoft.public.vc.mfc)
  • Re: [PATCH 00/28] Swap over NFS -v16
    ... memory they can consume. ... So we need the extra (skb) ... included in the reserve? ... if the allocation had to dip into emergency reserves, ...
    (Linux-Kernel)
  • Re: Memory leak with CAsyncSocket::Create
    ... read my essay on how storage allocators work. ... Create method is consuming system memory that is not released back to ... The memory consumption is either shown as "Mem Usage" on the Task ... many levels of allocation going ...
    (microsoft.public.vc.mfc)
  • Re: xmalloc string functions
    ... recover from any allocation failure. ... You get the memory, or you exit. ... A complex app where there may ... Or database apps where there isn't a robust memory recovery strategy in place - e.g. anything which uses an X server. ...
    (comp.lang.c)