[Python-Dev] ANSI strict aliasing and Python

Tim Peters tim.one@comcast.net
Fri, 18 Jul 2003 22:18:45 -0400


[Andrew MacIntyre]
> I've never used no-strict-aliasing with gcc on the EMX port.  With gcc
> 2.8.1 and 2.95.2 -O3, I've not seen failures that appear to be bad
> code; with gcc 3.2.1 I see 3 tests (test_codeccallbacks, test_format &
> test_unicode) that seem to have repeatable failures that are
> sensitive to optimisation level (-O3 = fail, -O2 = pass) which may be
> bad code.  I'll try -no-stict-aliasing when I get back digging into
> this.

It could also be plain bad code <wink>.

> BTW, the following sequence of tests causes a core dump from an
> assertion failure in test_enumerate on EMX which I haven't been able
> to replicate on FreeBSD :-(
>
> test_importhooks
> test_re
> test_glob
> test_parser
> test_enumerate

Excellent!  I just reproduced this in a debug build on Win98SE.  These are
sheer hell to track down, btw:

> I haven't played with all possible permutations, but skipping any one
> of the precursor tests doesn't exhibit the assertion failure, which
> is:
>
> Assertion failed: gc->gc.gc_refs != 0, file ../../Modules/gcmodule.c,
> line 231

I'm very familiar with that assertion, partly because I put it in <wink>,
but mostly because we get these seemingly every week in Zope's C code.
There are two possible causes, and they're never gc's fault (gc has
determined that there are more pointers to an object than are accounted for
by the object's refcount).  There are two causes:

1. Somebody forgot to incref.

or

2. I've seen this only once:  a tp_traverse slot is calling its visit()
   callback multiple times with the same contained object.  In the case
   I saw last week, code in Zope was trying to pass back each pointer
   in a linked list, one at a time, but due to a bug the linked list
   erroneously pointed back to its own interior, causing all the
   pointers in the list to get passed to visit() over and over again.

These can be sheer hell to track down, since gc is detecting a mistake that
may have occurred at any time since the Python run started (BTW,  Zope
suffers more than its share of this because some of its internals lie about
the true refcounts, faking weak dicts in a way that predates weak
references).

In the case above, the assertion triggers while Python is shutting down,
during the first call to PyGC_Collect() in Py_Finalize().  gc is traversing
a dict, and a dict value is the thing with the too-small refcount; it's an
object of _PyClass_Type ... so it's a classic class ... and its name is
"ParserError".  That's created in parsermodule.c.

What appears to be the same assertion on the same object can be triggered
earlier by passing -t1 to regrtest.py (this forces gc to run much more
often).  Then it asserts in the middle of running test_parser.py.  But not
in isolation!  Still seems to need other tests to run before it.

> I also encountered a hang in test_poll on FreeBSD 5.1 (gcc 3.2.2),
> which I suspect is more likely to be a problem with FreeBSD 5.1's
> library code (thread codebase instability) than Python.  I ran about
> a dozen full -r runs on FreeBSD 4.8 (gcc 2.95.4) with no sign of
> anything being amiss (which doesn't mean it isn't, just not readily
> tripped over).

We had a report on a Zope list today that 2.3b1 on FreeBSD (don't know more
about the version) couldn't import fcntl.  Doesn't sound like you've bumped
into that one.

> More digging clearly indicated on both EMX issues...

Indeed -- alas.