Linux fork and threads
Jonathan Giddy
jon at bezek.dstc.monash.edu.au
Wed Aug 9 10:32:55 EDT 2000
Seeing as I do a lot of multithreaded forking, I am interested in the
problems seen with 2.0.
Testing Python 2.0 on a Linux 2.2.14-3smp node shows the described
behaviour for me too:
1. Most commonly Python hangs, ps -ef shows:
jon 6547 6515 0 23:25 pts/3 00:00:00 ./py2k/bin/python ./py2k/lib/pyt
jon 6548 6547 0 23:25 pts/3 00:00:00 ./py2k/bin/python ./py2k/lib/pyt
jon 6549 6548 0 23:25 pts/3 00:00:00 ./py2k/bin/python ./py2k/lib/pyt
jon 6550 6548 0 23:25 pts/3 00:00:00 ./py2k/bin/python ./py2k/lib/pyt
jon 6551 6548 99 23:25 pts/3 00:09:22 ./py2k/bin/python ./py2k/lib/pyt
jon 6552 6548 99 23:25 pts/3 00:09:20 ./py2k/bin/python ./py2k/lib/pyt
Note: the last two threads are running flat out. Sometimes just one
of the 6 threads runs flat out.
2. Sometimes (~20%) I get a Segmentation fault, gdb shows:
GNU gdb 4.18
Copyright 1998 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "i686-pc-linux-gnu"...
Core was generated by `./py2k/bin/python ./py2k/lib/python2.0/test/test_fork1.py'.
Program terminated with signal 11, Segmentation fault.
Reading symbols from /lib/libpthread.so.0...done.
Reading symbols from /lib/libdl.so.2...done.
Reading symbols from /lib/libutil.so.1...done.
Reading symbols from /lib/libm.so.6...done.
Reading symbols from /lib/libc.so.6...done.
Reading symbols from /lib/ld-linux.so.2...done.
#0 0x2ab91e2e in __select () from /lib/libc.so.6
(gdb) bt
#0 0x2ab91e2e in __select () from /lib/libc.so.6
#1 0x7f1ffabc in ?? ()
#2 0x80b5d83 in time_sleep (self=0x0, args=0x8240a54) at ./timemodule.c:209
#3 0x8058d1e in call_builtin (func=0x823cfb0, arg=0x8246104, kw=0x0)
at ceval.c:2369
#4 0x8058c2b in PyEval_CallObjectWithKeywords (func=0x823cfb0, arg=0x8246104,
kw=0x0) at ceval.c:2337
#5 0x8057c4c in eval_code2 (co=0x8248e50, globals=0x8207ae4, locals=0x0,
args=0x823cf38, argcount=1, kws=0x0, kwcount=0, defs=0x0, defcount=0,
owner=0x0) at ceval.c:1675
#6 0x805906e in call_function (func=0x8243f54, arg=0x823cf2c, kw=0x0)
at ceval.c:2491
#7 0x8058c1d in PyEval_CallObjectWithKeywords (func=0x8243f54, arg=0x823cf2c,
kw=0x0) at ceval.c:2335
#8 0x809d3d9 in t_bootstrap (boot_raw=0x82105c8) at ./threadmodule.c:199
#9 0x2aac7032 in pthread_start_thread (arg=0x7f1ffe60) at manager.c:213
(gdb)
3. Very rarely, the test appears to succeed.
Removing the sleep from f() caused the segfault to occur in a different
location:
GNU gdb 4.18
Copyright 1998 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "i686-pc-linux-gnu"...
Core was generated by `./py2k/bin/python ./py2k/lib/python2.0/test/test_fork1.py'.
Program terminated with signal 11, Segmentation fault.
Reading symbols from /lib/libpthread.so.0...done.
Reading symbols from /lib/libdl.so.2...done.
Reading symbols from /lib/libutil.so.1...done.
Reading symbols from /lib/libm.so.6...done.
Reading symbols from /lib/libc.so.6...done.
Reading symbols from /lib/ld-linux.so.2...done.
#0 0x2ab158be in __sigsuspend (set=0x7f7ffac0)
at ../sysdeps/unix/sysv/linux/sigsuspend.c:48
(gdb) bt
#0 0x2ab158be in __sigsuspend (set=0x7f7ffac0)
at ../sysdeps/unix/sysv/linux/sigsuspend.c:48
#1 0x2aac624c in pthread_cond_wait (cond=0x821061c, mutex=0x8210628)
at restart.h:49
#2 0x8069eca in PyThread_acquire_lock (lock=0x8210618, waitflag=1)
at thread_pthread.h:311
#3 0x8056061 in eval_code2 (co=0x8248d48, globals=0x8207ae4, locals=0x0,
args=0x8210c08, argcount=1, kws=0x0, kwcount=0, defs=0x0, defcount=0,
owner=0x0) at ceval.c:598
#4 0x805906e in call_function (func=0x8243f54, arg=0x8210bfc, kw=0x0)
at ceval.c:2491
#5 0x8058c1d in PyEval_CallObjectWithKeywords (func=0x8243f54, arg=0x8210bfc,
kw=0x0) at ceval.c:2335
#6 0x809d3d9 in t_bootstrap (boot_raw=0x8248c28) at ./threadmodule.c:199
#7 0x2aac7032 in pthread_start_thread (arg=0x7f7ffe60) at manager.c:213
(gdb)
which is very close to, but actually after, the point where Tim's theory
predicts a problem. Maybe it's in the implementation details of the thread
library.
Either way, Tim appears to have identified yet another lapse in the threading
code. Has anybody thought of a good solution?
The best I've come up with so far is to have only one pthread_mutex for all
the locks (which would then consist only of a flag and a pthread_cond).
This would serialise all the lock operations, but would enable the fork
code to grab the single pthread_mutex (through a suitably generic interface)
during the actual fork() call.
Jon.
More information about the Python-list
mailing list