[Python-Dev] Child process freezes during fork pipe exec

Gangadharan S.A. gangadharan at gmail.com
Mon Jan 19 11:33:41 CET 2009


 Hi,

Summary:
    * In my organization, we have a *multi threaded* (threading library)
python (python 2.4.1) daemon on Linux, which starts up various processes
using the fork pipe exec model.
    * We use this fork , wait on pipe , exec model as  a form of handshake
between the parent and child processes. We want the child to go ahead only
after the parent has noted down the fact that the child has been forked and
what it's pid is.
    * This usually works fine, but for about 1 in every 20,000 processes
started, the child process just freezes somewhere after the fork, before the
exec. It does not die. It is alive and stuck.
    * Why does this happen?
    * Is there a better way for us to write a
fork-wait_for_start_signal-exec construct?

Here is what we do:
    One of the threads of the multi threaded python daemon does the
following
        1) Fork out a child process
        2) Child process waits for a pipe message from parent
        --- Parent sends pipe message to child after noting down child
details : pid, start time etc. ---
        3) Child process prints various debug messages, including looking at
os.environ values
        4) Child process execs the right script

Here it is again, in pseudo code:
    def start_job():
        read_pipefd, write_pipefd = os.pipe()

        # 1) Fork out a child process
        pid = os.fork()

        if pid == 0:
            # 2) wait for excepted message on pipe
            os.close(write_pipefd)
            read_set, write_set, exp_set = select.select([read_pipefd], [],
[], 300)
            if os.read(read_pipefd, len("expected message") <> "expected
message":
                os._exit(1)
            os.close(read_pipefd)

            # 3) print various debug messages, including os.environ values
            print >> sys.stderr, "we print various debug messages here,
including os.evniron values"

            # 4) go ahead with exec
            os.execve(path, args, env)
        else:
            # parent process sends pipe message to child at the right time


The problem:
    * Things work fine most of the time, but rarely, the process gets
"stuck" after fork, before exec (In steps 2 or 3 above). Process makes no
progress and does not die either.
    * When I do a gdb (gdb 6.5) attach on the process, bt fails as follows:

        (gdb) bt
        #0  0x00002ba9fd5c6a68 in __lll_mutex_lock_wait () from
        /lib64/libpthread.so.0
        #1  0x00002ba9fd5c2a78 in _L_mutex_lock_106 () from
        /lib64/libpthread.so.0
        dwarf2-frame.c:521: internal-error: Unknown CFI encountered.
        A problem internal to GDB has been detected,
        further debugging may prove unreliable.
        Quit this debugging session? (y or n)

      I looked into this error and found that pre-6.6 gdb throws this error
when looking at the stack trace of a deadlocked process. This is certainly
not a dead lock in my code as there is no locking involved in this area of
code.
    * This problem happens for about 1 process in every 20,000. This
statistics is gathered across about 80 machines in our cluster, so its not
the case of a single machine having some hardware issue.
    * Note that the child is forked out by a *multi threaded* python
application. I noticed some forums discussing how multi threaded (pthreads
library) processes doing things between a fork and an exec can rarely get
into  a deadlock. I know that python ( atleast 2.4.1 ) multi threading does
not use pthreads, but probably the python interpreter itself does use
pthreads?

Questions:
    * Why does this happen?
    * Is there a better way for us to write a
fork-wait_for_start_signal-exec construct in a multi threaded application?

Thanks,
Gangadharan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20090119/c39f9bb4/attachment-0001.htm>


More information about the Python-Dev mailing list