[ python-Bugs-1731717 ] race condition in subprocess module

SourceForge.net noreply at sourceforge.net
Wed Aug 1 19:05:56 CEST 2007


Bugs item #1731717, was opened at 2007-06-06 08:19
Message generated for change (Comment added) made by abo
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1731717&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Python Library
Group: Python 2.4
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: dsagal (dsagal)
Assigned to: Peter Åstrand (astrand)
Summary: race condition in subprocess module

Initial Comment:
Python's subprocess module has a race condition: Popen() constructor has a call to global "_cleanup()" function on whenever a Popen object gets created, and that call causes a check for all pending Popen objects whether their subprocess has exited - i.e. the poll() method is called for every active Popen object.

Now, if I create Popen object "foo" in one thread, and then a.wait(), and meanwhile I create another Popen object "bar" in another thread, then a.poll() might get called by _clean() right at the time when my first thread is running a.wait(). But those are not synchronized - each calls os.waitpid() if returncode is None, but the section which checks returncode and calls os.waitpid() is not protected as a critical section should be.

The following code illustrates the problem, and is known to break reasonably consistenly with Python2.4. Changes to subprocess in Python2.5 seems to address a somewhat related problem, but not this one.

import sys, os, threading, subprocess, time

class X(threading.Thread):
  def __init__(self, *args, **kwargs):
    super(X, self).__init__(*args, **kwargs)
    self.start()

def tt():
  s = subprocess.Popen(("/bin/ls", "-a", "/tmp"), stdout=subprocess.PIPE,
      universal_newlines=True)
  # time.sleep(1)
  s.communicate()[0]

for i in xrange(1000):
  X(target = tt)

This typically gives a few dozen errors like these:
Exception in thread Thread-795:
Traceback (most recent call last):
  File "/usr/lib/python2.4/threading.py", line 442, in __bootstrap
    self.run()
  File "/usr/lib/python2.4/threading.py", line 422, in run
    self.__target(*self.__args, **self.__kwargs)
  File "z.py", line 21, in tt
    s.communicate()[0]
  File "/usr/lib/python2.4/subprocess.py", line 1083, in communicate
    self.wait()
  File "/usr/lib/python2.4/subprocess.py", line 1007, in wait
    pid, sts = os.waitpid(self.pid, 0)
OSError: [Errno 10] No child processes

Note that uncommenting time.sleep(1) fixes the problem. So does wrapping subprocess.poll() and wait() with a lock. So does removing the call to global _cleanup() in Popen constructor.

----------------------------------------------------------------------

Comment By: Donovan Baarda (abo)
Date: 2007-08-02 03:05

Message:
Logged In: YES 
user_id=10273
Originator: NO

It appears that subprocess calls a module global "_cleanup()" whenever
opening a new subprocess. This method is meant to reap any child processes
that have terminated and have not explicitly cleaned up. These are
processes you would expect to be cleaned up by GC, however, subprocess
keeps a list of of all spawned subprocesses in _active until they are
reaped explicitly so it can cleanup any that nolonger referenced anywhere
else.

The problem is lots of methods, including poll() and wait(), check
self.returncode and then modify it. Any non-atomic read/modify action is
inherently non-threadsafe. And _cleanup() calls poll() on all un-reaped
child processes. If two threads happen to try and spawn subprocesses at
once, these _cleanup() calls collide..

The way to fix this depends on how thread-safe you want to make it. If you
want to share popen objects between threads to wait()/poll() with impunity
from any thread, you should add a recursive lock attribute to the Popen
instance and have it lock/release it at the start/end of every method
call.

If you only care about using popen objects in one thread at a time, then
all you need to fix is the nasty "every popen created calls poll() on every
other living popen object regardless of what thread started them,
and poll() is not threadsafe" behaviour.

Removing _cleanup() is one way, but it will then not reap child processes
that you del'ed all references to (except the one in subprocess._active)
before you checked they were done.

Probably another good idea is to not append and remove popen objects to
_active directly, instead and add a popen.__del__() method that defers
GC'ing of non-finished popen objects by adding them to _active. This
way, _active only contains un-reaped child processes that were due to be
GC'ed. _cleanup() will then be responsible for polling and removing these
popen objects from _active when they are done.

However, this alone will not fix things because you are still calling
_cleanup() from different threads, it is still calling poll() on all these
un-reaped processes, and poll() is not threadsafe. So... you either have to
make poll() threadsafe (lock/unlock at the beginning/end of the method), or
make _cleanup() threadsafe. The reason you can get away with making only
_cleanup() threadsafe this way is _active will contain a list of processes
that are not referenced anywhere else, so you know the only thing that will
call poll() on them is the _cleanup() method.


----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2007-06-07 15:26

Message:
Logged In: YES 
user_id=33168
Originator: NO

Peter, could you take a look at this?  Thanks.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1731717&group_id=5470


More information about the Python-bugs-list mailing list