need to safely spawn subshells from multithreaded server - how?
Thilo Ernst
Thilo.Ernst at dlr.de
Tue Mar 5 05:40:02 EST 2002
Somehow I was much more lucky in digging the web today, so after more reading up I give a preliminary
answer to my own question. Perhaps this might help other people running into the same problem.
It would be nice in this regard if any thread/fork experts would assess whether the situation is correctly
summarized below.
1. The main server thread will not exist in the child process, so it cannot interfer. Reason: Python threads
on Linux are implemented as pthreads (Posix Threads), and the Posix spec prescribes that only
one thread survives in the child process - the one that issued the fork() . (Only with Solaris threads,
fork() clones all threads, but there is a special fork1() call, too, whith the clone-forking-thread-only
semantics.) See http://www.lambdacs.com/cpt/MFAQ.html#Q120 for more info.
2. Consequently, the second question is moot.
3. No news regarding fork() alternatives on Linux. I don't think there is any reasonably-supported one.
Generally, however, forking in a Python thread on Linux seems to have gotten reasonably safe
recently thanks to a fix that went into the 2.x Python releases. One still should avoid chains
of forks - the patch required a certain amount of memory to be knowingly sacrificed in the child
process to buy us deadlock safety. See
http://sourceforge.net/tracker/?group_id=5470&atid=305470&aid=401226&func=detail,
which also mentions a remaining, rare problem affecting the parent process issuing
the fork. As Posix allows no thread-related effects whatsoever to occur on the parent process
from a fork(), the issue was eventually diagnosed as a Linux kernel problem and as such left to
the proper authorities. See
http://groups.google.de/groups?hl=de&lr=lang_de|lang_en&selm=38E6F2BA.E66CAC90%40ensim.com
which recommends implementing a popen() replacement in a C extension as a good workaround (no code
published though).
Best regards, Thilo
I wrote:
> Hello c.l.py,
>
> I'm building something best described as a 'job dispatcher' . This program is intended as a
> long-running process which takes requests via an easy-to-talk-to remote object protocol such
> as XML-RPC or PYRO (the requests come from another Python interpreter process), and
> according to these requests starts, stops and otherwise supervises lots of child processes.
>
> The child processes will be Unix (platform for the whole thing is Linux/x86) command line
> applications to be started in subshells. As each child process will typically run a few seconds
> or even minutes, and there will be lots of concurrent requests which should not block each other,
> I want the dispatcher server to be multithreaded. This in principle is no problem - multithreading
> is supported by available Python implementations of the protocols above. Using one of these,
> the situation will be that the dispatcher server has a "main thread" listening for requests, and as
> soon as a request comes in it is handed over to a "handler thread" created for the purpose,
> freeing the main thread to listen for further requests.
>
> However after researching the web for similar problems/approaches I came to suspect I might
> be in for unpleasant surprises with the above design. Python multithreading combined with
> the spawning of child processes is said to be dangerous. All popen() variants, os.system(), and
> pty.spawn() rely on fork() which produces a full-blown clone of my server process. As long
> as I cannot make 100% sure that only the intended thread - the "handler thread" for the current
> request- continues to run in the child process, it seems that I might end up with two competing server
> threads (one in parent, one in child).
>
> Now my questions. Execuitve summary: "can it be done, and how?"
>
> 1. Did I get something utterly wrong here? Is the danger of getting two interfering server threads
> a real one at all? This isn't the type of question a quick prototype reliably answers. Interestingly,
> from Zope (which is a multithreaded server, too), I've been fork()ing subshells happily and ignorantly
> for years already with no apparent problems, and so will other people have done. But I'd rather
> be on the safe side. I don't want inexplicable failures due to rare race conditions or something
> similar later on.
>
> 2. Can I enforce (from Python) that only the handler thread continues to run in the child process?
> My current idea is to somehow block the server main thread right before the fork(), and
> unblock it immediately afterwards - but only in the parent process. Would that be safe? And
> could I perhaps reach that goal without having to substantially change the implementation
> of my chosen remote object system?
>
> 3. Any other ideas? Is there any other, fork()-less way to spawn a subshell (from Python, on Linux)?
>
> The following Tim Peters quote (from <mailman.1001479048.15356.python-list at python.org>)
> doesn't sound too encouraging...
> [...]
More information about the Python-list
mailing list