[Python-bugs-list] [ python-Bugs-471942 ] python2.1.1 SEGV in GC on Solaris 2.7

noreply@sourceforge.net noreply@sourceforge.net
Thu, 18 Oct 2001 03:11:45 -0700


Bugs item #471942, was opened at 2001-10-16 19:56
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=105470&aid=471942&group_id=5470

Category: Python Interpreter Core
Group: Python 2.1.1
Status: Open
Resolution: None
Priority: 5
Submitted By: Anthony Baxter (anthonybaxter)
Assigned to: Neil Schemenauer (nascheme)
Summary: python2.1.1 SEGV in GC on Solaris 2.7

Initial Comment:
I've got a Zope installation where python2.1.1 is
segfaulting on Solaris2.7 - it's running a largish 
ZEO server. The tail of the gdb output is:

 #128 0x26164 in PyEval_CallObjectWithKeywords ()
 #129 0x264c0 in PyEval_CallObjectWithKeywords ()
 #130 0x26140 in PyEval_CallObjectWithKeywords ()
 #131 0x25fc0 in PyEval_CallObjectWithKeywords ()
 #132 0x517bc in PyInstance_New ()
 #133 0x261a4 in PyEval_CallObjectWithKeywords ()
 #134 0x25fc0 in PyEval_CallObjectWithKeywords ()
 #135 0x42c90 in initgc ()

It's built with 
<anthony@devhost1>$ gcc -v
Reading specs from
/opt/local/lib/gcc-lib/sparc-sun-solaris2.7/2.95.2/specs
gcc version 2.95.2 19991024 (release)
which is a bit old. 

I'm going to rebuild with gcc3.0 and also try turning 
off the GC. Unfortunately I can't get this to happen
on a smaller test system - it's only under load that 
it plows into the ground. 

I'll also leave symbols in this time... :/


----------------------------------------------------------------------

>Comment By: Martin v. Löwis (loewis)
Date: 2001-10-18 03:11

Message:
Logged In: YES 
user_id=21627

There are two options:

a) the object isn't really a GC object, i.e. has no GC
header. In gdb, you can try to cast gc to PyObject*, then
see if the resulting pointer has a better ob_type (this is
unlikely, though, since the logic entering the object was
already using fromgc/togc)

b) somebody has cleared the ob_type field.

Can you guarantee that all extension modules have been
compiled with the 2.1.1 header files?

Is the problem repeatable in the sense that gc will have the
same pointer value on each crash? If so, it is relatively
easy to track down: just set a gdb change watchpoint on the
address on the ob_type field of that address (note that
setting watchpoints is not possible until there is really
mapped memory on that address).

If you can't analyse it through change breakpoints, I
recommend to annotate the interpreter in the following way:
in pyobject_init, put a printf that prints the address and
the tp_name of the type. In subtract_refs, if the ob_type
slot is null, print the address of the object and abort.
Then analyse the log to see whether a object really has been
allocated on that address, and what its type was (make sure
you consider the possibility that address are off by the
delta that FROM_GC adds).

----------------------------------------------------------------------

Comment By: Anthony Baxter (anthonybaxter)
Date: 2001-10-17 21:58

Message:
Logged In: YES 
user_id=29957

Ok, I have an intact core file, and a matching binary,
no optimisations, nothing. This crash is showing the
crash at line 166 of gcmodule.c
 traverse = PyObject_FROM_GC(gc)->ob_type->tp_traverse;
PyObject_FROM_GC(gc)->ob_type in this case is

$24 = {ob_refcnt = 1, ob_type = 0x0}

To check my logic, I checked gc_next and gc_prev using 
the same GDB magic, and they correctly show up as a tuple
and an instance method. 

Some fiddling around seems to rule out stack space as the
problem, as well. We're going to try and see if purify 
helps here, but the problem looks to be a junk object - 
I have no idea how to track this down further. Help?
Would taking the horrible horrible hack of removing the
object from the gc linked list if ob_type is null help?
Well, it'd stop the crashes, anyway.


----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2001-10-17 13:44

Message:
Logged In: YES 
user_id=21627

It would be interesting what the value of "gc" is at the 
time of the crash. It looks like you got an object that 
claims to support GC but has a null tp_traverse.


----------------------------------------------------------------------

Comment By: Anthony Baxter (anthonybaxter)
Date: 2001-10-17 06:08

Message:
Logged In: YES 
user_id=29957

I'm a doofus who read the gdb trace from the wrong
end - too much python lately :)
Nonetheless, the other end of the trace failed in 
gc as well - and building without GC enabled worked.

Here's the trace with debugging enabled:

#0  0xff00 in ?? ()
#1  0x402f0 in collect (young=0x9b538, old=0x9b544) at
./Modules/gcmodule.c:379
#2  0x405a8 in collect_generations () at
./Modules/gcmodule.c:484
#3  0x40624 in _PyGC_Insert (op=0xbc1f24) at
./Modules/gcmodule.c:507
#4  0x5a224 in PyList_New (size=0) at Objects/listobject.c:61
#5  0x21bc8 in eval_code2 (co=0x1cb370, globals=0x21bc0,
locals=0x67,
    args=0x0, argcount=1, kws=0xf89b24, kwcount=0, defs=0x0,
defcount=0,
    closure=0xbc1f24) at Python/ceval.c:1741

Next trick is to rebuild without any optimisation (sigh)
as I suspect that it's inlined subtract_refs().



----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=105470&aid=471942&group_id=5470