[Python-bugs-list] [ python-Bugs-495401 ] Build troubles: --with-pymalloc

noreply@sourceforge.net noreply@sourceforge.net
Sat, 29 Dec 2001 14:01:47 -0800


Bugs item #495401, was opened at 2001-12-20 05:24
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=105470&aid=495401&group_id=5470

Category: Build
Group: Platform-specific
Status: Open
Resolution: None
Priority: 5
Submitted By: Walter Dörwald (doerwalter)
Assigned to: Nobody/Anonymous (nobody)
Summary: Build troubles: --with-pymalloc

Initial Comment:
The build process segfaults with the current CVS 
version when using --with-pymalloc

System is SuSE Linux 7.0

> uname -a
Linux amazonas 2.2.16-SMP #1 SMP Wed Aug 2 20:01:21 
GMT 2000 i686 unknown
> gcc -v
Reading specs from /usr/lib/gcc-lib/i486-suse-
linux/2.95.2/specs
gcc version 2.95.2 19991024 (release)

Attached is the complete build log.


----------------------------------------------------------------------

>Comment By: Michael Hudson (mwh)
Date: 2001-12-29 14:01

Message:
Logged In: YES 
user_id=6656

Hmm.  I now think that the stuff about extension modules is 
almost certainly a read herring.  What I said about "make && 
make altinstall" vs "make altinstall" still seems to be true, 
though.

If you compile with --with-pydebug, you crash right at the 
end of the second (-O) run of compileall.py -- I suspect this 
is something else, but it might not be.


----------------------------------------------------------------------

Comment By: Michael Hudson (mwh)
Date: 2001-12-29 13:29

Message:
Logged In: YES 
user_id=6656

I don't know if these are helpful observations or not, but 
anyway:

(1) it doesn't core without the --enable-unicode=ucs4 option
(2) if you just run "make altinstall"  the library files are 
installed *and compiled* before the dynamically linked 
modules are built.  Then we don't crash.
(3) if you run "make altinstall" again, we crash.
If you initially ran "make && make install", we crash.
(4) when we crash, it's not long after the unicode tests are 
compiled.

Are these real clues or just red herrings?  I'm afraid I 
can't tell :(


----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2001-12-29 10:43

Message:
Logged In: YES 
user_id=31435

Ouch.  Boosted priority back to 5, since Martin can 
reproduce it.  Alas, where pymalloc got called *from* is 
almost certainly irrelevant -- we're seeing the end result 
of earlier corruption.

Note that pymalloc is unusually sensitive to off-by-1 
stores, since the chunks it hands out are contiguous 
(there's no hidden bookkeeping padding between them).  
Plausible:  an earlier bogus store went beyond the end of 
its allocated chunk, overwriting the "next free block" 
pointer at the start of a previously free()'ed chunk of the 
same size (rounded up to a multiple of 8; 40 bytes in this 
case).

At the time this blows up, bp is supposed to point to a 
previously free()'ed chunk of size 40 bytes (if there were 
none free()'ed and available, the earlier "pool != pool-
>nextpool" guard should have failed).  The first 4 bytes 
(let's simplify by assuming this is a 32-bit box) of the 
free chunks link the free chunks together, most recently 
free()'ed at the start of the (singly linked) list.  So the 
code at this point is intent on returning bp, and "pool-
>freeblock = *(block **)bp" is setting the 40-byte-chunk 
list header's idea of the *next* available 40-byte chunk.

But bp is bogus.  The value of bp is gotten out of the free 
list headers, the static array usedpools.  This mechanism 
is horridly obscure, an array of pointer pairs that, in 
effect, capture just the first two members of the 
pool_header struct, once for each chunk size.  It's 
possible that someone is overwriting usedpools[4 + 4]-
>freeblock directly with 2, but that seems unlikely.

More likely is that a free() operation linked a 40-byte 
chunk into the list headed at usedpools[4+4]->freeblock 
correctly, and a later bad store overwrote the first 4 
bytes of the free()'ed block with 2.  Then the "pool-
>freeblock = *(block **)bp)" near the start of an 
unexceptional pymalloc would copy the 2 into the list 
header's freeblock without complaint.  The error wouldn't 
show up until a subsequent malloc tried to use it.

So that's one idea to get closer to the cause:  add code to 
dereference pool->freeblock, before the "return (void *)
bp".  If that blows up earlier, then the first four bytes 
of bp were corrupted, and that gives you a useful data 
breakpoint address for the next run.  If it doesn't blow up 
earlier, the corruption will be harder to find, but let's 
count on being lucky at first <wink>.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2001-12-29 08:00

Message:
Logged In: YES 
user_id=21627

Ok, I can reproduce it now; I did not 'make install' before.
Here is a gdb back trace

#0  _PyCore_ObjectMalloc (nbytes=33) at Objects/obmalloc.c:417
#1  0x805885c in PyString_FromString (str=0x816c6e8
"checkJoin") at Objects/stringobject.c:136
#2  0x805d772 in PyString_InternFromString (cp=0x816c6e8
"checkJoin") at Objects/stringobject.c:3640
#3  0x807c9c6 in com_addop_varname (c=0xbfffe87c, kind=0,
name=0x816c6e8 "checkJoin")
    at Python/compile.c:939
#4  0x807de24 in com_atom (c=0xbfffe87c, n=0x816c258) at
Python/compile.c:1478
#5  0x807f01c in com_power (c=0xbfffe87c, n=0x816c8b8) at
Python/compile.c:1846
#6  0x807f545 in com_factor (c=0xbfffe87c, n=0x816c898) at
Python/compile.c:1975
#7  0x807f56c in com_term (c=0xbfffe87c, n=0x816c878) at
Python/compile.c:1985
#8  0x807f6bc in com_arith_expr (c=0xbfffe87c, n=0x816c858)
at Python/compile.c:2020
#9  0x807f7dc in com_shift_expr (c=0xbfffe87c, n=0x816c838)
at Python/compile.c:2046
#10 0x807f8fc in com_and_expr (c=0xbfffe87c, n=0x816c818) at
Python/compile.c:2072
#11 0x807fa0c in com_xor_expr (c=0xbfffe87c, n=0x816c7f8) at
Python/compile.c:2094
...

The access that crashes is *(block **)bp, since bp is 0x2.

Not sure how that happens; I'll investigate (but would
appreciate a clue). It seems that the pool chain got corrupted.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2001-12-29 06:52

Message:
Logged In: YES 
user_id=6380

Aha!  The --enable-unicode=ucs4 is more suspicious than the
--with-pymalloc. I had missed that info when this was first
reported.

Not that I'm any closer to solving it... :-(

----------------------------------------------------------------------

Comment By: Walter Dörwald (doerwalter)
Date: 2001-12-29 02:53

Message:
Logged In: YES 
user_id=89016

OK, I did a "make distclean" which removed .o files and
the build directory and redid a "./configure --enable-
unicode=ucs4 --with-pymalloc && make && make altinstall".

The build process still crashes in the same spot:
Compiling /usr/local/lib/python2.2/test/test_urlparse.py ...
make: *** [libinstall] Segmentation fault

I also retried with a fresh untarred Python-2.2.tgz. This 
shows the same behaviour.



----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2001-12-29 01:23

Message:
Logged In: YES 
user_id=21627

Atleast I cannot reproduce it, on SuSE 7.3. Can you retry
this, building from a clean source tree (no .o files, no
build directory)?

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2001-12-28 14:30

Message:
Logged In: YES 
user_id=6380

My prediction: this is irreproducible. Lowering the priority
accordingly.

----------------------------------------------------------------------

Comment By: Walter Dörwald (doerwalter)
Date: 2001-12-23 04:24

Message:
Logged In: YES 
user_id=89016

Unfortunately no core file was generated. Can I somehow 
force core file generation?

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2001-12-22 06:42

Message:
Logged In: YES 
user_id=21627

Did that produce a core file? If so, can you attach a gdb
backtrace as well?

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=105470&aid=495401&group_id=5470