Tracing down segfault

Tim Peters tim.peters at gmail.com
Sat Jun 25 00:07:02 EDT 2005


[Tony Meyer]
> I have (unfortunately) a Python program that I can consistently (in a
> reproducible way) segfault.  However, I've got somewhat used to Python's
> very nice habit of protecting me from segfaults and raising exceptions
> instead, and am having trouble tracking down the problem.
>
> The problem that occurs looks something like this:
> 
> Program received signal SIGSEGV, Segmentation fault.
> 0x00a502aa in ?? ()
> (gdb) bt
> #0  0x00a502aa in ?? ()
> Cannot access memory at address 0x0
>
> Which looks something like accessing a NULL pointer to me.

Worse, if you can't get a stack trace out of gdb, it suggests that bad
C code has corrupted the C stack beyond intelligibility.  The original
SIGSEGV was _probably_ due to a NULL pointer dereference too (although
it could be due to any string of nonsense bits getting used as an
address).

The _best_ thing to do next is to rebuild Python, and as many other
packages as possible, in debug mode.  For ZODB/ZEO, you do that like
so:

     python setup.py build_ext -i --debug

It's especially useful to rebuild Python that way.  Many asserts are
enabled then, and all of Python's memory allocations go thru a special
debug allocator then with gimmicks to try and catch out-of-bounds
stores, double frees, and use of free()'d memory.

> The problem is finding the code that is causing this, so I can work around
> it (or fix it).  Unfortunately, the script uses ZEO, ZODB,
> threading.Threads, and wx (my code is pure Python, though),

You didn't mention which version of any of these you're using, or the
OS in use.  Playing historical odds, and assuming relatively recent
versions of all, wx is the best guess.

> and I'm having trouble creating a simple version that isolates the problem
> (I'm pretty sure it started happening when I switched from thread to
> threading, but I'm not sure why that would be causing a problem;

It's unlikely to be the true cause.  Apart from some new-in-2.4
thread-local storage gimmicks, all of the threading module is written
in Python too.  NULL pointers are a (depressingly common) C problem.

> I am join()ing all threads before this happens).

So only a single thread is running at the time the segfault occurs? 
Is Python also in the process of tearing itself down (i.e., is the
program trying to exit?).

One historical source of nasties is trying to get more than one thread
to play nicely with GUIs.

> Does anyone have any advice for tracking this down?

Nope, can't think of a thing -- upgrade to Windows <wink>.



More information about the Python-list mailing list