The reliability of python threads

Nick Maclaren nmm1 at cus.cam.ac.uk
Wed Jan 24 15:16:26 EST 2007


In article <mailman.3105.1169664198.32031.python-list at python.org>,
"Carl J. Van Arsdall" <cvanarsdall at mvista.com> writes:
|> Chris Mellon wrote:
|> >
|> > Logic and programming errors in user code are far more likely to be
|> > the cause of random errors in a threaded program than theoretical
|> > (I've never come across a case in practice) issues with the POSIX
|> > standard.
|> >   
|> Yea, typically I would think that.  The problem I am seeing is 
|> incredibly intermittent.  Like a simple pyro server that gives me a 
|> problem maybe every three or four months.  Just something funky will 
|> happen to the state of the whole thing, some bad data, i'm having an 
|> issue tracking it down and some more experienced programmers mentioned 
|> that its most likely a race condition.  THe thing is, I'm really not 
|> doing anything too crazy, so i'm having difficult tracking it down.  I 
|> had heard in the past that there may be issues with threads, so I 
|> thought to investigate this side of things. 

I have seen that many dozens of times on half a dozen Unices, but have
only tracked down the cause in a handful of cases.  Of those,
implementation defects that are sanctioned by the standards have
accounted for about half.

Note that the term "race condition" is accurate but misleading!  One
of the worst problems with POSIX is that it does not define how
non-memory global state is synchronised.  For example, it is possible
for a memory update and an associated signal to occur on different
sides of a synchronisation boundary.  Similarly, it is possible for
I/O to sidestep POSIX's synchronisation boundaries.  I have seen both.

Perhaps the nastiest is that POSIX leaves it unclear whether the
action of synchronisation is transitive.  So, if A synchronises with
B, and then B with C, A may not have synchronised with C.  Again, I
have seen that.  It can happen on Intel systems, according to the
experts I know.

|> Would you consider the Linux implementation of threads to be concrete?

In this sort of area, Linux tends to be saner than most systems, but
remember that it has had MUCH less stress testing on threaded codes
than many other Unices.  In fact, it was only a few years ago that
Linux threads became stable enough to be worth using.

Note that failures due to implementation defects and flaws in the
standards are likely to show up in very obscure ways; ones due to
programmer error tend to be much simpler.

If you want to contact me by Email, and can describe technically
what you are doing and (most importantly) what you are assuming, I
may be able to give some hints.  But no promises.


Regards,
Nick Maclaren.



More information about the Python-list mailing list