need to safely spawn subshells from multithreaded server - how?

Tue Mar 5 05:40:02 EST 2002

Somehow I was much more lucky in digging the web today, so after more reading up I give a preliminary
answer to my own question. Perhaps this might help other people running into the same problem.
It would be nice in this regard if any thread/fork experts would assess whether the situation is correctly
summarized below.

1. The main server thread will not exist in the child process, so it cannot interfer. Reason: Python threads
    on Linux are implemented as pthreads (Posix Threads), and the Posix spec prescribes that only
    one thread survives in the child process - the one that issued the fork() . (Only with Solaris threads,
    fork() clones all threads, but there is a special fork1() call, too,  whith the clone-forking-thread-only
    semantics.) See http://www.lambdacs.com/cpt/MFAQ.html#Q120 for more info.

2. Consequently, the second question is moot.

3. No news regarding fork() alternatives on Linux. I don't think there is any reasonably-supported one.

Generally, however, forking in a Python thread on Linux seems to have gotten reasonably safe
recently thanks to a fix that went into the 2.x Python releases. One still should avoid chains
of forks -  the patch required  a certain amount of memory to be knowingly sacrificed in the child
process to buy us deadlock safety. See
http://sourceforge.net/tracker/?group_id=5470&atid=305470&aid=401226&func=detail,
which also mentions a remaining, rare problem  affecting the parent process issuing
the fork.  As Posix allows no thread-related effects whatsoever to occur on the parent process
from a fork(),  the issue was eventually diagnosed as a Linux kernel problem and as such left to
the  proper authorities. See
http://groups.google.de/groups?hl=de&lr=lang_de|lang_en&selm=38E6F2BA.E66CAC90%40ensim.com
which recommends implementing a popen() replacement in a C extension as a good workaround (no code
published though).

Best regards, Thilo

I wrote:

> Hello c.l.py,
>
> I'm building something best described as a  'job dispatcher' .  This program  is intended as a
> long-running process which takes requests via an easy-to-talk-to remote object protocol such
> as XML-RPC or PYRO  (the requests come from another Python interpreter process), and
> according to these requests starts, stops and otherwise supervises lots of child processes.
>
> The child processes will be Unix (platform for the whole thing is Linux/x86) command line
> applications to be started in subshells.  As each child process will typically run a few seconds
> or even minutes, and there will be lots of concurrent requests which should not block each other,
> I want the dispatcher server to be multithreaded. This in principle is no problem - multithreading
> is supported by available Python implementations  of  the protocols above. Using one of these,
> the situation will be that the dispatcher server has a "main thread" listening for requests, and as
> soon as a request  comes in it is handed over to a "handler thread" created for the purpose,
> freeing the main thread to listen for further requests.
>
> However after researching the web for similar problems/approaches I came to suspect I might
> be in for unpleasant surprises with the above design. Python multithreading combined with
> the spawning of child processes is said to be dangerous.  All popen()  variants, os.system(), and
> pty.spawn() rely on  fork() which produces a full-blown clone of my  server process.  As long
> as I  cannot make 100% sure  that only the intended thread  - the "handler thread" for the current
> request- continues to run in the child process, it seems that I might end up with two competing server
> threads (one in parent, one in child).
>
> Now my questions. Execuitve summary: "can it be done, and how?"
>
> 1. Did I get something utterly wrong here? Is the danger of getting two interfering server threads
>      a real one at all? This isn't the type of question a quick prototype reliably answers. Interestingly,
>      from  Zope (which is a multithreaded server, too),  I've been fork()ing subshells happily and ignorantly
>      for years already with no apparent problems, and so will other people have done. But I'd rather
>      be on the safe side. I don't want inexplicable failures due to rare race conditions or something
>      similar later on.
>
> 2. Can I enforce (from Python) that only the handler thread continues to run in the child process?
>     My current idea is to somehow block the server main thread right before  the fork(), and
>     unblock it  immediately afterwards - but only in the parent process. Would that be safe? And
>     could I perhaps reach that goal without having to substantially change the implementation
>     of my chosen remote object system?
>
> 3. Any other ideas? Is there any other, fork()-less way to spawn a subshell (from Python, on Linux)?
>
> The following Tim Peters quote (from <mailman.1001479048.15356.python-list at python.org>)
> doesn't sound too encouraging...
> [...]