The reliability of python threads

Thu Jan 25 14:03:49 EST 2007

On Jan 25, 9:26 am, n... at cus.cam.ac.uk (Nick Maclaren) wrote:
> In article <1169675599.502726.5... at a34g2000cwb.googlegroups.com>,"Paddy" <paddy3... at netscape.net> writes:|>
> |> Three to four months before `strange errors`? I'd spend some time
> |> correlating logs; not just for your program, but for everything running
> |> on the server. Then I'd expect to cut my losses and arrange to safely
> |> re-start the program every TWO months.
> |> (I'd arrange the re-start after collecting logs but before their
> |> analysis. Life is too short).
>
> Forget it.  That strategy is fine in general, but is a waste of time
> where threading issues are involved (or signal handling, or some types
> of communication problem, for that matter).

Nah, Its a great strategy. it keeps you up and running when all you
know for sure is that you will most likely be able to keep things
together for three months normally.
The OP only thinks its a threading problem - it doesn't matter what the
true fix will be, as long as arranging to re-start the server well
before its likely to go down doesn't take too long, compared to your
exploration of the problem, and, of course, you have to be able to
afford the glitch in availability.

> There are three unrelated
> killer facts that interact:
>
>     Such failures are usually probabilistic ("Poisson process"), and
> so have no "history".
>
>     The expected number is usually proportional to the square of the
> activity, sometimes a higher power.
>
>     Virtually nothing involved does any routine logging, or even has
> options to log relevant events.
>
> The first means that the strategy of restarting doesn't help.  All
> three mean that current logs are almost never any use.
> 
> Regards,
> Nick Maclaren.