difference in popen() or system() in thread vs. top level
Rich Drewes
pydrew01 at interstice.com
Fri Aug 8 12:25:13 EDT 2003
Hello,
I'm having a strange problem: *certain* subprograms invoked from within
python 2.2 via popen/system/spawnv don't seem to return control to
python. Others return to python just fine. The invoked subprogram
*does* complete, but control does not return to python. Furthermore,
and this is the weird part, in python 2.2 I only see the problem when
the subprogram is invoked from *within a thread*! If the same
subprogram is invoked from the top level of execution, then control
returns to python just fine. The problem does not exist in python
1.5.2; everything works fine there under identical conditions, both when
the subprogram is invoked from the top level and also from a thread.
To summarize:
* problem is not seen in python 1.5.2; subprograms always return control
to python
* problem is not seen ever when subprogram invocation is from top level
of program;
only when subprogram invocation is from within a thread in python 2.2
* only certain subprograms cause a problem; unfortunately, the only
program I have
found that causes a problem is called mpirun (part of the MPI
package), which
you probably don't have, so you may not be able to reproduce the
problem. I'd
be inclined to say that mpirun is doing something wrong, but why does it
only cause a hang when invoked from within a thread? Why does it
work OK
in python 1.5.2? *What* could mpirun be doing that makes it have
this property in
combination with being invoked from a python *thread*? (I have
examined mpirun.
It is a perl script and hence nearly impossible to understand :)
However, it is using
ssh to invoke programs on multiple nodes in a cluster and then they
coordinate
via MPI. When I use strace to examine the processes
when things hang after being invoked from a python thread, it is
waiting for
an accept().)
* redirection of stdio, stdin, stderr to /dev/null in the subprogram
does not solve problem,
reducing likelihood that issue is I/O deadlock
* wrapping the subprogram in another shell does not solve the problem!
Further,
the environment (as shown by set) is identical when the invocation is
done from
the program top level and from within a thread. What else could be
different
in the thread's invocation of the program vs. when the program is
invoked from
python's top level of execution!? Could a signal (SIGCHILD?) be
sent from
the subprogram to the *main* python program and not the thread, and
therefore
getting lost?
Attached below is fairly simple program that shows this problem.
Unfortunately, unless you have
mpirun on your system you probably won't be able to reproduce this.
Thanks in advance for any help you could provide. I know this is a bit
obscure.
Rich
----
#!/usr/bin/python2
import os, sys, time, popen2
import threading
def InvokeThread(cmd):
print "invocation in thread, cmd:", cmd
os.system(cmd)
# the following line is never printed, even though the command we invoke
# does complete:
print "back from invocation in thread"
sys.stdout.flush()
# this command will work just fine:
#cmd="/usr/bin/find /home/drewes/GArun"
# this command will show the problem:
cmd="/opt/mpich/myrinet/gcc/bin/mpirun -machinefile
/home/drewes/machtest.mpi -np 1 /bin/ls -lat < /dev/null > /dev/null 2>
/dev/null"
# invoking from the main (top) thread always works:
print "invocation from program top level, not in thread, cmd:", cmd
os.system(cmd)
print "back from normal invocation, not in thread"
print "starting invocation from thread"
thist=threading.Thread(target=InvokeThread, args=(cmd,))
thist.start()
nl=threading.enumerate()
while(len(nl) > 1):
# note: the master thread (this main one) counts in the threadlist too
time.sleep(1)
nl=threading.enumerate()
print "there are", len(nl), "threads running"
# this will always print "2 threads running"; thread never terminates
# since system() call in thread never returns
More information about the Python-list
mailing list