difference in popen() or system() in thread vs. top level

Rich Drewes pydrew01 at interstice.com
Fri Aug 8 12:25:13 EDT 2003


Hello,

I'm having a strange problem:  *certain* subprograms invoked from within 
python 2.2 via popen/system/spawnv don't seem to return control to 
python.  Others return to python just fine.  The invoked subprogram 
*does* complete, but control does not return to python.  Furthermore, 
and this is the weird part, in python 2.2 I only see the problem when 
the subprogram is invoked from *within a thread*!  If the same 
subprogram is invoked from the top level of execution, then control 
returns to python just fine.  The problem does not exist in python 
1.5.2; everything works fine there under identical conditions, both when 
the subprogram is invoked from the top level and also from a thread.

To summarize:

* problem is not seen in python 1.5.2; subprograms always return control 
to python
* problem is not seen ever when subprogram invocation is from top level 
of program;
   only when subprogram invocation is from within a thread in python 2.2
* only certain subprograms cause a problem; unfortunately, the only 
program I have
   found that causes a problem is called mpirun (part of the MPI 
package), which
   you probably don't have, so you may not be able to reproduce the 
problem.  I'd
   be inclined to say that mpirun is doing something wrong, but why does it
   only cause a hang when invoked from within a thread?  Why does it 
work OK
   in python 1.5.2?  *What* could mpirun be doing that makes it have 
this property in
    combination with being invoked from a python *thread*?  (I have 
examined mpirun.
    It is a perl script and hence nearly impossible to understand :)  
However, it is using
    ssh to invoke programs on multiple nodes in a cluster and then they 
coordinate
    via MPI.  When I use strace to examine the processes
    when things hang after being invoked from a python thread, it is 
waiting for
    an accept().)
* redirection of stdio, stdin, stderr to /dev/null in the subprogram 
does not solve problem,
   reducing likelihood that issue is I/O deadlock
* wrapping the subprogram in another shell does not solve the problem!  
Further,
   the environment (as shown by set) is identical when the invocation is 
done from
   the program top level and from within a thread.  What else could be 
different
    in the thread's invocation of the program vs. when the program is 
invoked from
    python's top level of execution!?   Could a signal (SIGCHILD?) be 
sent from
    the subprogram to the *main* python program and not the thread, and 
therefore
    getting lost?

Attached below is fairly simple program that shows this problem.  
Unfortunately, unless you have
mpirun on your system you probably won't be able to reproduce this.

Thanks in advance for any help you could provide.  I know this is a bit 
obscure.

Rich

----
#!/usr/bin/python2
import os, sys, time, popen2
import threading
                                                                               

def InvokeThread(cmd):
   print "invocation in thread, cmd:", cmd
   os.system(cmd)
   # the following line is never printed, even though the command we invoke
   # does complete:
   print "back from invocation in thread"
   sys.stdout.flush()
                                                                               

# this command will work just fine:
#cmd="/usr/bin/find /home/drewes/GArun"
# this command will show the problem:
cmd="/opt/mpich/myrinet/gcc/bin/mpirun -machinefile 
/home/drewes/machtest.mpi -np 1 /bin/ls -lat < /dev/null > /dev/null 2> 
/dev/null"

# invoking from the main (top) thread always works: 
                                                                           
print "invocation from program top level, not in thread, cmd:", cmd
os.system(cmd)
print "back from normal invocation, not in thread"
                                                                               

print "starting invocation from thread"
thist=threading.Thread(target=InvokeThread, args=(cmd,))
thist.start()
                                                                               

nl=threading.enumerate()
while(len(nl) > 1):
   # note:  the master thread (this main one) counts in the threadlist too
   time.sleep(1)
   nl=threading.enumerate()
   print "there are", len(nl), "threads running"
   # this will always print "2 threads running"; thread never terminates
   # since system() call in thread never returns







More information about the Python-list mailing list