Python reliability

Jp Calderone exarkun at divmod.com
Sun Oct 9 23:12:00 EDT 2005


On Mon, 10 Oct 2005 12:18:42 +1000, Steven D'Aprano <steve at removemecyber.com.au> wrote:
>George Sakkis wrote:
>
>> Steven D'Aprano wrote:
>>
>>
>>>On Sun, 09 Oct 2005 23:00:04 +0300, Ville Voipio wrote:
>>>
>>>
>>>>I would need to make some high-reliability software
>>>>running on Linux in an embedded system. Performance
>>>>(or lack of it) is not an issue, reliability is.
>>>
>>>[snip]
>>>
>>>
>>>>The software should be running continously for
>>>>practically forever (at least a year without a reboot).
>>>>Is the Python interpreter (on Linux) stable and
>>>>leak-free enough to achieve this?
>>>
>>>If performance is really not such an issue, would it really matter if you
>>>periodically restarted Python? Starting Python takes a tiny amount of time:
>>
>>
>> You must have missed or misinterpreted the "The software should be
>> running continously for practically forever" part. The problem of
>> restarting python is not the 200 msec lost but putting at stake
>> reliability (e.g. for health monitoring devices, avionics, nuclear
>> reactor controllers, etc.) and robustness (e.g. a computation that
>> takes weeks of cpu time to complete is interrupted without the
>> possibility to restart from the point it stopped).
>
>
>Er, no, I didn't miss that at all. I did miss that it
>needed continual network connections. I don't know if
>there is a way around that issue, although mobile
>phones move in and out of network areas, swapping
>connections when and as needed.
>
>But as for reliability, well, tell that to Buzz Aldrin
>and Neil Armstrong. The Apollo 11 moon lander rebooted
>multiple times on the way down to the surface. It was
>designed to recover gracefully when rebooting unexpectedly:
>
>http://www.hq.nasa.gov/office/pao/History/alsj/a11/a11.1201-pa.html
>

This reminds me of crash-only software:

  http://www.stanford.edu/~candea/papers/crashonly/crashonly.html

Which seems to have some merits.  I have yet to attempt to develop any large scale software explicitly using this technique (although I have worked on several systems that very loosely used this approach; eg, a server which divided tasks into two processes, with one restarting the other whenever it noticed it was gone), but as you point out, there's certainly precedent.

Jp



More information about the Python-list mailing list