Troubleshooting garbage collection issues

Rhamphoryncus rhamph at gmail.com
Sun Nov 18 18:07:09 EST 2007


On Nov 17, 10:34 am, "davemer... at gmail.com" <davemer... at gmail.com>
wrote:
> Hi folks - wondering if anyone has any pointers on troubleshooting
> garbage collection.  My colleagues and I are running into an
> interesting problem:
>
> Intermittently, we get into a situation where the garbage collection
> code is running in an infinite loop.  The data structures within the
> garbage collector have been corrupted, but it is unclear how or why.
> The problem is extremely difficult to reproduce consistently as it is
> unpredictable.
>
> The infinite loop itself occurs in gcmodule.c, update_refs.  After
> hitting this in the debugger a couple of times, it appears that that
> one of the nodes in the second or third generation list contains a
> pointer to the first generation head node.  The first generation was
> cleared shortly before the call into this function, so it contains a
> prev and next which point to itself.  Once this loop hits that node,
> it spins infinitely.
>
> Chances are another module we're depending on has done something
> hinkey with GC.  The challenge is tracking that down.  If anyone has
> seen something like this before and has either pointers to specific GC
> usage issues that can create this behavior or some additional thoughts
> on tricks to track it down to the offending module, they would be most
> appreciated.
>
> You can assume we've done some of the "usual" things - hacking up
> gcmodule to spit information when the condition occurs, various
> headstands and gymnastics in an attempt to identify reliable steps to
> reproduce - the challenge is the layers of indirection that we think
> are likely present between the manifestation of the problem and the
> module that produced it.

Does "usual things" also include compiling with --with-pydebug?

You could also try the various memory debuggers.  A refcounting error
is the first thing that comes to mind, although I can't see off hand
how this specific problem would come about.

Are you using threading at all?

Do you see any pattern to the types that have the bogus pointers?

--
Adam Olsen, aka Rhamphoryncus



More information about the Python-list mailing list