Python 1.6a2 crashes in test_fork1 on my Mandrake Linux 7 machine

Sat Apr 15 21:53:31 EDT 2000

Oliver Andrich wrote:

> On Fri, Apr 14, 2000 at 03:47:18PM +0200, Thomas Wouters wrote:
> > I dont know enough about threads to say anything useful about this, but it
> > is kind of strange that you dont see the same problem under Python 1.5.2.

Well, I've been fooling around trying to figure this one out.  It doesn't appear
to
be related to GCC.  I tried the exact same GCC's on both a Redhat 6.2 and my
Mandrake 7 (w/ cooker) and it worked fine on Redhat, but failed on Mandrake.
I also downloaded a recent (4/10/2000) GCC from cvs, and had the same results
on both systems.  Also, test_fork1.py fails for me on Mandrake even with Python
1.5.2.  I also tried it w/ Linux kernel 2.3.pre99-5 and 2.2.14; same results.
And finally,
I installed the Mandrake glibc on Redhat and *still* had the same results.
However, my Mandrake machine is 2 CPU SMP, while the redhat was one CPU,
which leads me to the next phase of testing...

I traced the error, using strategic printf's, to the posixmodule posix_waitpid()
function, and I think it is a bug in the  Py_END_ALLOW_THREADS or
Py_BEGIN_ALLOW_THREADS macros.  Basically, when the problem occurs,
which is not always (and the frequency of occurence seems to change depending
on how many print statements I use in test_fork1.py), the code seems to go into
an
infinite loop in either of these two macros (using "top" shows python at 99%
cpu).
Here is an example of the "top" output when the test_fork1.py program locks up.

54 processes: 49 sleeping, 4 running, 1 zombie, 0 stopped
CPU states: 54.5% user,  2.8% system,  0.0% nice, 42.5% idle
Mem:   127780K av,  124460K used,    3320K free,   80140K shrd,    2472K buff
Swap:  265032K av,     836K used,  264196K free                   44804K cached

  PID USER     PRI  NI  SIZE  RSS SHARE STAT  LIB %CPU %MEM   TIME COMMAND
14133 chad      18   0  1172 1172   804 R       0 98.2  0.9  11:15 python
  521 root       0   0 28712  28M  2260 S       0  7.7 22.4   3:36 X

This indicates to me (naive as I am about threads) that there is spinlock going
nuts
waiting forever for threads to block or unblock.   Try putting a fprintf() just
before
and after these macros in posixmodule.c:posix_waitpid().  The actual kernel
waitpid()
call itself, and thus the forking, seems to be fine.

> May be we have a again a code generation problem on
> Mandrake 7.

I hope not... Because of school, work, and taxes, I'm probably too busy to track
this one
down further.  I forgot to test my Mandrake compiled binary on a Redhat 6.2
system, so
I would suggest that as another test.  But it appears, from where I'm sitting, to
be a problem
with threading, and not necessarily w/ code generation (for once).

If I discover more, I'll post it here.

Chad Netzer
cnetzer at stanford.edu