[pypy-dev] Looking for clues to consistent Seg Fault in PyPy 2.6.1

Armin Rigo arigo at tunes.org
Sun Oct 11 11:52:14 CEST 2015


Hi,

On Sun, Oct 11, 2015 at 12:28 AM, Jeff Doran <jdoran at lexmachina.com> wrote:
> I've run out of options trying to find a Seg Fault which happens when
> running lxml under PyPy 2.6.1.  This problem only occurs under PyPy as the
> rest of the code works fine under CPython 2.7.   I've been in contact with
> the lxml dev team and they confirmed my problem, but could not determine
> where the cause of the Seg Fault lies.

After some debugging, it seems that the PyPy-specific code with
weakrefs in "proxy.pxi" is to blame.  It seems to me that it would
also have the same problem if it were compiled on CPython.  (I
understand why it is there, and indeed it is necessary to do
*something* different on PyPy.)

The problem is that if you start with two C structures "xmlNode" which
form a small tree:

   XA:   xmlNode   with child XB
   XB:   xmlNode

You have two corresponding Python objects (actually cdef class
_Element, but I think it's not important that they are Cython classes
here):

   EA:   _Element   with _c_code = XA
   EB:   _Element   with _c_code = XB

The reverse pointing is done differently on PyPy and on CPython.  On
CPython first:

   XA._private = (void *)EA
   XB._private = (void *)EB

It's a plain pointer which doesn't hold a reference.  The deallocation
logic of _Element will reset the '_c_code._private' pointer back to
NULL.

On PyPy instead, there is an indirection: _private holds a reference
to a weakref object.  The effect is mostly the same.  But the
deallocation logic of _Element is subtly different as a result.  Let's
dig:

The deallocation logic of E is: we reset E._c_code._private to NULL,
and then if all X's in the tree have _private "set to NULL", then
delete the whole tree.  The problem is that "set to NULL" is more
subtle in the weakref version.  It really means "contains a weakref to
a dead object".  But weakrefs can die *before* the deallocator for
their target is called.  This is possible in both PyPy and CPython.
So what occurs here:

* we forget both EA and EB at the same time (for CPython, it can occur
if there are in a cycle).

* both weakrefs die

* we call the deallocator of EA: it thinks the whole tree is dead
because all weakrefs are dead, and frees it

* we call the deallocator of EB: it still has _c_code pointing to XB,
but that is garbage and crashes.

That's the problem.  I don't have a fix right now :-)


A bientôt,

Armin.


More information about the pypy-dev mailing list