[C++-sig] subprocess fork() sometimes hangs when called from within boost::python::exec_file

Peter Schüller schueller.p at gmail.com
Thu Apr 2 21:12:49 CEST 2015


Dear all,

In the hexhex project on github [1] we execute a python program using
boost::python::exec_file, the respective code is in
src/PythonPlugin.cpp in the function PythonPlugin::runPythonMain(...).

In this python program I use subprocess.call() to run a shell
application (actually graphviz).

Sometimes it works, sometimes it hangs using 100% CPU. It seems to
depend on how much the C++ program had to do before it executed the
python part. In both cases the C++ program uses threads (is that
relevant wrt. fork() in python? I found some hints online that it
could be relevant).

I traced it down to os.fork() in subprocess.py which returns in one
case and does not return at all (neither parent nor child) in the
other case.

If I do strace -f on the process I get the following (the
SIGPROF/rt_sigreturn/clone repeats until the disk is full):

[pid 18646] clone(child_stack=0,
flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD,
child_tidptr=0x7f96a5fa2a50) = ? ERESTARTNOINTR (To be restarted)
[pid 18646] --- SIGPROF {si_signo=SIGPROF, si_code=SI_KERNEL} ---
[pid 18646] rt_sigreturn()              = 56
[pid 18646] clone(child_stack=0,
flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD,
child_tidptr=0x7f96a5fa2a50) = ? ERESTARTNOINTR (To be restarted)
[pid 18646] --- SIGPROF {si_signo=SIGPROF, si_code=SI_KERNEL} ---
[pid 18646] rt_sigreturn()              = 56
[pid 18646] clone(child_stack=0,
flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD,
child_tidptr=0x7f96a5fa2a50) = ? ERESTARTNOINTR (To be restarted)
[pid 18646] --- SIGPROF {si_signo=SIGPROF, si_code=SI_KERNEL} ---
[pid 18646] rt_sigreturn()              = 56
[pid 18646] clone(child_stack=0,
flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD,
child_tidptr=0x7f96a5fa2a50) = ? ERESTARTNOINTR (To be restarted)
[pid 18646] --- SIGPROF {si_signo=SIGPROF, si_code=SI_KERNEL} ---

If i do strace -f on the working process it looks like this:

[pid 21442] clone(child_stack=0,
flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD,
child_tidptr=0x7f5f4c1e5a50) = ? ERESTARTNOINTR (To be restarted)
[pid 21442] --- SIGPROF {si_signo=SIGPROF, si_code=SI_KERNEL} ---
[pid 21442] rt_sigreturn()              = 56
[pid 21442] clone(child_stack=0,
flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD,
child_tidptr=0x7f5f4c1e5a50) = ? ERESTARTNOINTR (To be restarted)
[pid 21442] --- SIGPROF {si_signo=SIGPROF, si_code=SI_KERNEL} ---
[pid 21442] rt_sigreturn()              = 56
[pid 21442] clone(child_stack=0,
flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD,
child_tidptr=0x7f5f4c1e5a50) = ? ERESTARTNOINTR (To be restarted)
[pid 21442] --- SIGPROF {si_signo=SIGPROF, si_code=SI_KERNEL} ---
[pid 21442] rt_sigreturn()              = 56
[pid 21442] clone(child_stack=0,
flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD,
child_tidptr=0x7f5f4c1e5a50) = 21448
Process 21448 attached
[pid 21448] set_robust_list(0x7f5f4c1e5a60, 24 <unfinished ...>
[pid 21442] write(2, "got pid", 7got pid <unfinished ...>
[pid 21448] <... set_robust_list resumed> ) = 0
[pid 21442] <... write resumed> )       = 7
[pid 21442] write(2, " ", 1  <unfinished ...>
[pid 21448] close(7 <unfinished ...>

If I do not do strace but attach with gdb instead, the bactrace is as follows:

(gdb) bt
#0  0x00007fa03c9c5025 in __libc_fork () at
../nptl/sysdeps/unix/sysv/linux/x86_64/../fork.c:130
#1  0x00007fa03ccdb155 in __fork () at
../nptl/sysdeps/unix/sysv/linux/pt-fork.c:25
#2  0x00007fa03c36d6eb in posix_fork (self=<optimized out>,
noargs=<optimized out>) at ../Modules/posixmodule.c:3832
#3  0x00007fa03c2f2d33 in call_function (oparg=<optimized out>,
pp_stack=0x7fffbbc90ae0) at ../Python/ceval.c:4004
#4  PyEval_EvalFrameEx (f=f at entry=0x225e950,
throwflag=throwflag at entry=0) at ../Python/ceval.c:2666
#5  0x00007fa03c2f517d in PyEval_EvalCodeEx (co=<optimized out>,
globals=<optimized out>, locals=locals at entry=0x0, args=<optimized
out>,
    argcount=argcount at entry=18, kws=0x2693418, kwcount=0, defs=0x0,
defcount=0, closure=closure at entry=0x0) at ../Python/ceval.c:3252
#6  0x00007fa03c2f2dd8 in fast_function (nk=<optimized out>, na=18,
n=18, pp_stack=0x7fffbbc90d00, func=0x7fa0351e6410)
    at ../Python/ceval.c:4116
#7  call_function (oparg=<optimized out>, pp_stack=0x7fffbbc90d00) at
../Python/ceval.c:4041
#8  PyEval_EvalFrameEx (f=f at entry=0x2693140,
throwflag=throwflag at entry=0) at ../Python/ceval.c:2666
#9  0x00007fa03c2f517d in PyEval_EvalCodeEx (co=<optimized out>,
globals=<optimized out>, locals=locals at entry=0x0,
    args=args at entry=0x7fa0317ee530, argcount=2,
kws=kws at entry=0x7fa0317ee578, kwcount=kwcount at entry=1,
defs=defs at entry=0x7fa03545bce8,
    defcount=defcount at entry=13, closure=0x0) at ../Python/ceval.c:3252
#10 0x00007fa03c2f53e5 in function_call.15737 (func=0x7fa0351cff50,
arg=0x7fa0317ee518, kw=0x7fa0318247f8) at ../Objects/funcobject.c:526
#11 0x00007fa03c2bce23 in PyObject_Call
(func=func at entry=0x7fa0351cff50, arg=arg at entry=0x7fa0317ee518,
kw=kw at entry=0x7fa0318247f8)
    at ../Objects/abstract.c:2529
#12 0x00007fa03c21030d in instancemethod_call.8988
(func=0x7fa0351cff50, arg=0x7fa0317ee518, kw=0x7fa0318247f8)
    at ../Objects/classobject.c:2602
#13 0x00007fa03c2bce23 in PyObject_Call
(func=func at entry=0x7fa035438f00, arg=arg at entry=0x7fa0317e8e50,
kw=kw at entry=0x7fa0318247f8)
    at ../Objects/abstract.c:2529
#14 0x00007fa03c2c148f in slot_tp_init.26761 (self=0x7fa0317f0110,
args=0x7fa0317e8e50, kwds=0x7fa0318247f8) at
../Objects/typeobject.c:5692
#15 0x00007fa03c2bf4df in type_call.26491 (type=<optimized out>,
args=0x7fa0317e8e50, kwds=0x7fa0318247f8) at
../Objects/typeobject.c:745
#16 0x00007fa03c2bce23 in PyObject_Call (func=func at entry=0x1d7aab0,
arg=arg at entry=0x7fa0317e8e50, kw=kw at entry=0x7fa0318247f8)
    at ../Objects/abstract.c:2529
#17 0x00007fa03c2ef3b1 in ext_do_call (nk=<optimized out>,
na=<optimized out>, flags=<optimized out>, pp_stack=0x7fffbbc91300,
    func=0x1d7aab0) at ../Python/ceval.c:4333
#18 PyEval_EvalFrameEx (f=f at entry=0x7fa0317f2050,
throwflag=throwflag at entry=0) at ../Python/ceval.c:2705
#19 0x00007fa03c2f517d in PyEval_EvalCodeEx (co=<optimized out>,
globals=<optimized out>, locals=locals at entry=0x0, args=<optimized
out>,
    argcount=argcount at entry=1, kws=0x1da91e0, kwcount=1, defs=0x0,
defcount=0, closure=closure at entry=0x0) at ../Python/ceval.c:3252
#20 0x00007fa03c2f2dd8 in fast_function (nk=<optimized out>, na=1,
n=3, pp_stack=0x7fffbbc91520, func=0x7fa0351cfcf8)
    at ../Python/ceval.c:4116
#21 call_function (oparg=<optimized out>, pp_stack=0x7fffbbc91520) at
../Python/ceval.c:4041
#22 PyEval_EvalFrameEx (f=f at entry=0x1da8fc0,
throwflag=throwflag at entry=0) at ../Python/ceval.c:2666
#23 0x00007fa03c2f517d in PyEval_EvalCodeEx (co=<optimized out>,
globals=<optimized out>, locals=locals at entry=0x0,
    args=args at entry=0x7fa03ebdd068, argcount=0, kws=kws at entry=0x0,
kwcount=kwcount at entry=0, defs=defs at entry=0x0,
defcount=defcount at entry=0,
    closure=0x0) at ../Python/ceval.c:3252
#24 0x00007fa03c2f5310 in function_call.15737 (func=0x7fa0351cf5f0,
arg=0x7fa03ebdd050, kw=0x0) at ../Objects/funcobject.c:526
#25 0x00007fa03c2bce23 in PyObject_Call
(func=func at entry=0x7fa0351cf5f0, arg=arg at entry=0x7fa03ebdd050,
kw=<optimized out>)
    at ../Objects/abstract.c:2529
#26 0x00007fa03c27b837 in PyEval_CallObjectWithKeywords
(func=func at entry=0x7fa0351cf5f0, arg=arg at entry=0x7fa03ebdd050,
kw=kw at entry=0x0)
    at ../Python/ceval.c:3889
#27 0x00007fa03c2feb57 in PyEval_CallFunction (obj=0x7fa0351cf5f0,
format=<optimized out>) at ../Python/modsupport.c:557
#28 0x00007fa03e58b94b in dlvhex::PythonPlugin::runPythonMain(std::string) ()
   from /home/ps/_science/projects/hex/installdir_benchmark/lib/libdlvhex2-base.so.11
#29 0x0000000000421eda in main ()

I am using python 2.7 on ubuntu 14.04.

How can I find out why it is hanging? Could it be a bug of the
os.fork() implementation?

Best Regards,
Peter

[1] https://github.com/hexhex/core/blob/master/src/PythonPlugin.cpp

-- 
http://www.peterschueller.com/


More information about the Cplusplus-sig mailing list