The reliability of python threads

Thu Jan 25 15:00:56 EST 2007

"Paddy" <paddy3118 at netscape.net> writes:
> No, you should think of the service that needs to be up. You seem to be
> talking about how it can't be fixed rather than looking for ways to
> keep things going.

But you're proposing cargo cult programming.  There is no reason
whatsoever to expect that restarting the server now and then will help
the problem in the slightest.  Nick used the fancy term Poisson
process but it just means that the probability of failure at any
moment is independent of what's happened in the past, like the
spontaneous radioactive decay of an atom.  It's not like a mechanical
system where some part gradually gets worn out and eventually breaks,
so you can prevent the failure by replacing the part every so often.

> A little learning is fine but "it can't theoretically be fixed" is
> no solution.

The best you can do is identify the unfixable situations precisely and
work around them.  Precision is important.

The next best thing is have several servers running simultaneously,
with failure detection and automatic failover.  

If a server is failing at random every few months, trying to prevent
that by restarting it every so often is just shooting in the dark.
Think of your server stopping now and then because there's a power
failure, where you get power failures every few months on the average.
Shutting down your server once a month, unplugging it, and plugging it
back in will do nothing to prevent those outages.  You need to either
identify and fix whatever is causing the power outages, or install a
backup generator.