Troubleshooting garbage collection issues

davemerkel at gmail.com davemerkel at gmail.com
Sat Nov 17 12:34:41 EST 2007


Hi folks - wondering if anyone has any pointers on troubleshooting
garbage collection.  My colleagues and I are running into an
interesting problem:

Intermittently, we get into a situation where the garbage collection
code is running in an infinite loop.  The data structures within the
garbage collector have been corrupted, but it is unclear how or why.
The problem is extremely difficult to reproduce consistently as it is
unpredictable.

The infinite loop itself occurs in gcmodule.c, update_refs.  After
hitting this in the debugger a couple of times, it appears that that
one of the nodes in the second or third generation list contains a
pointer to the first generation head node.  The first generation was
cleared shortly before the call into this function, so it contains a
prev and next which point to itself.  Once this loop hits that node,
it spins infinitely.

Chances are another module we're depending on has done something
hinkey with GC.  The challenge is tracking that down.  If anyone has
seen something like this before and has either pointers to specific GC
usage issues that can create this behavior or some additional thoughts
on tricks to track it down to the offending module, they would be most
appreciated.

You can assume we've done some of the "usual" things - hacking up
gcmodule to spit information when the condition occurs, various
headstands and gymnastics in an attempt to identify reliable steps to
reproduce - the challenge is the layers of indirection that we think
are likely present between the manifestation of the problem and the
module that produced it.

Many thanks,

Dave



More information about the Python-list mailing list