Re: Python reliability



George Sakkis wrote:

Steven D'Aprano wrote:


On Sun, 09 Oct 2005 23:00:04 +0300, Ville Voipio wrote:


I would need to make some high-reliability software
running on Linux in an embedded system. Performance
(or lack of it) is not an issue, reliability is.

[snip]


The software should be running continously for
practically forever (at least a year without a reboot).
Is the Python interpreter (on Linux) stable and
leak-free enough to achieve this?

If performance is really not such an issue, would it really matter if you periodically restarted Python? Starting Python takes a tiny amount of time:


You must have missed or misinterpreted the "The software should be
running continously for practically forever" part. The problem of
restarting python is not the 200 msec lost but putting at stake
reliability (e.g. for health monitoring devices, avionics, nuclear
reactor controllers, etc.) and robustness (e.g. a computation that
takes weeks of cpu time to complete is interrupted without the
possibility to restart from the point it stopped).


Er, no, I didn't miss that at all. I did miss that it needed continual network connections. I don't know if there is a way around that issue, although mobile phones move in and out of network areas, swapping connections when and as needed.

But as for reliability, well, tell that to Buzz Aldrin and Neil Armstrong. The Apollo 11 moon lander rebooted multiple times on the way down to the surface. It was designed to recover gracefully when rebooting unexpectedly:

http://www.hq.nasa.gov/office/pao/History/alsj/a11/a11.1201-pa.html

I don't have an authoritive source of how many times the computer rebooted during the landing, but it was measured in the dozens. Calculations were performed in an iterative fashion, with an initial estimate that was improved over time. If a calculation was interupted the computer lost no more than one iteration.

I'm not saying that this strategy is practical or useful for the original poster, but it *might* be. In a noisy environment, it pays to design a system that can recover transparently from a lost connection.

If your heart monitor can reboot in 200 ms, you might miss one or two beats, but so long as you pick up the next one, that's just noise. If your calculation takes more than a day of CPU time to complete, you should design it in such a way that you can save state and pick it up again when you are ready. You never know when the cleaner will accidently unplug the computer...


-- Steven.

.



Relevant Pages

  • Re: Python reliability
    ... Performance is not an issue, reliability is. ... practically forever (at least a year without a reboot). ... periodically restarted Python? ...
    (comp.lang.python)
  • Re: How to make sure a Service starts BEFORE a user logs on?
    ... If you're worried about an admin restarting your service, well, ... detecting a "reboot" is ... determine this with 100% reliability), but it doesn't sound like that's ... After the user logs on the machine, ...
    (microsoft.public.windowsxp.security_admin)
  • Re: Progress?
    ... * Give us back the assurance of never having to reboot. ... reliability ... To clarify, these weren't my points, they were from the original post, ...
    (Fedora)
  • XP home edition
    ... reliability. ... I leave my system on continuously, and many times Outlook Express freezes up and will not terminate, which requires a reboot, and on 2 occussions, all inbox emails were erased. ... areas, however i have had other applications freeze and XP will or cannot end their task, a hard reboot is required. ...
    (microsoft.public.windowsxp.help_and_support)