[Patches] [ python-Patches-578297 ] fix for problems with test_longexp

noreply@sourceforge.net noreply@sourceforge.net
Sun, 21 Jul 2002 06:16:17 -0700


Patches item #578297, was opened at 2002-07-07 16:21
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=578297&group_id=5470

Category: Parser/Compiler
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Andrew I MacIntyre (aimacintyre)
Assigned to: Andrew I MacIntyre (aimacintyre)
Summary: fix for problems with test_longexp 

Initial Comment:
The OS/2 EMX port has long had problems with
test_longexp, which triggers gross memory consumption
on this platform as a result of platform malloc behaviour.

More recently, this appears to have been identified in
MacPython under certain circumstances, although the
problem is apparently more a speed issue than a memory
consumption issue.

The core of the problem is the blizzard of small
mallocs as the parser builds the parse tree and creates
tokens.

The attached patch takes advantage of PyMalloc (built
in by default for 2.3) to insulate the parser from
adverse behaviour in the platform malloc.

The patch has been tested on OS/2 and FreeBSD:
- on OS/2, the patch allows even a system with modest
resources to complete test_longexp successfully and
without swapping to death; on better resourced
machines, the whole regression test is negligibly
slower (0-1%) to complete.  [gcc-2.8.1 -O2]
- on FreeBSD (4.4 tested), test_longexp gains nearly
10%, and completes the whole regression test with a
gain of about 2% (test_longexp is good for about 25% of
the improvement).  [gcc-2.95.3 -O3]
Both platforms are neutral, performance wise, running
MAL's PyBench 1.0.

The patch in its current form is for experimental
evaluation, and not intended for integration into the core.

If there is interest in seeing this integrated, I'd
like feedback on a more elegant way to implement the
functional change.

I've assigned this to Jack for review in the context of
its performance on the Mac.

----------------------------------------------------------------------

>Comment By: Andrew I MacIntyre (aimacintyre)
Date: 2002-07-21 23:16

Message:
Logged In: YES 
user_id=250749

Ok, I've prepared patches to convert the following files to
use PyMalloc for memory allocation:
Parser/[acceler.c|node.c|parsetok,c] (pymalloc-parser.diff)
Python/compile.c (pymalloc-compile.diff)

I didn't bother with the other files in Parser/ as my malloc
logging shows that they only ever appear to make requests >
256 bytes.

I have attached/will attach a summary from my malloc logging
experiments for information.

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-16 04:14

Message:
Logged In: YES 
user_id=31435

Thanks for the detailed followup, Andrew!  I incorporated 
some of this info into XXXROUNDUP's comments.

Without either patch, the system malloc has to do two 
miserable things:  (1) find bigger and bigger memory areas 
very frequently; and, (2) interleaved with that, allocate 
gazillions of tiny blocks too.  #2 makes it difficult for the 
platform malloc to find free space contiguous to the blocks 
allocated for #1, unless it arranges to move them to "the 
end" of memory, or into their own memory segments.  As a 
result it's likely to do a copy on nearly every large-block 
realloc, and the code used to do a realloc on every 3rd new 
child.

The XXXROUNDUP patch addressed #1 by asking to grow 
blocks much less frequently; PyMalloc addresses #2 by 
getting the tiny blocks out of the platform malloc's hair.  If 
the platform malloc is saved from either one, it's job 
becomes much easier.

It would still be nice to switch the parser to using 
pymalloc.  There are still disasters lurking, because some 
platform malloc packages appear to take quadratic time 
when *free*ing gazillions of tiny blocks (they thrash trying 
to coalesce them into larger contiguous free blocks).  
pymalloc doesn't try to coalesce free blocks, so is reliably 
immune to this disease.

----------------------------------------------------------------------

Comment By: Andrew I MacIntyre (aimacintyre)
Date: 2002-07-15 21:47

Message:
Logged In: YES 
user_id=250749

To my surprise, Tim's checkin also works for the EMX port.

I can only conclude that EMX's realloc() has a corner case
tickled by test_longexp, that isn't hit with either the
aggressive overallocation change or the PyMalloc change applied.

It is also interesting to note the performance impact of
Tim's checkin, particularly on FreeBSD.

Typical runtimes for "python -E -tt Lib/test/regrtest.py -l
test_longexp" on my P5-166SMP test box (FreeBSD 4.4, gcc
2.95.3 -O3):
                         total    user    sys
baseline:                39.1s    32.7s   6.3s
my patch:                37.1s    30.3    6.7s
Tim's checkin:            8.4s     7.8s   0.6s
my patch+Tim's checkin    5.5s     4.9s   0.5s

These runs with Library modules already compiled.

While Tim's comments about timing the regression test are
noted, there are nonetheless consistent reductions in
execution time of the regression test as well.
Typical results on the same test box:
                         total    user    sys
baseline:                1386s    1097s   89s
my patch:                1350s    1065s   93s
Tim's checkin:           1265s    1003s   67s
my patch+Tim's checkin   1230s     971s   65s

With the EMX port, the difference in timing between Tim's
checkin and my patch is small, both for test_longexp and the
regression test.  There are noticeable gains for both
test_longexp and the whole regression test with both changes
in place, although not as significant as the FreeBSD results.

MAL's PyBench 1.0 exhibits negligible performance
differences between the code states on both platforms, which
is as I'd expect as it doesn't appear to test compile() or
eval().

>From the above, I conclude that Tim's patch gets the most
bang for the buck, and that my patch (or its intent) be
rejected unless someone thinks pursuing the PyMalloc changes
to the parser worthwhile.

As an aside, I did a little research on the "XXX are those
actually common?" question Tim posed in the comment
associated with his change:
In running Lib/compileall.py against the Lib directory, 89%
of PyMem_RESIZE() calls in AddChild() are the n=1 case, and
9% are rounded up to n=4.

----------------------------------------------------------------------

Comment By: Jack Jansen (jackjansen)
Date: 2002-07-08 20:09

Message:
Logged In: YES 
user_id=45365

With Tim's mods test_import and test_longexp now work fine in MacPython. This is both with and without Andrew's patch.

Andrew, I'm assigning back to you, there's little more I can do with this patch. And you'll have to check if you still need it, or whether Tims change to node.c is goo enough for OS/2 as well.

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-07-08 16:38

Message:
Logged In: YES 
user_id=31435

Jack, please do a cvs update and try this again.  I checked 
in changes to PyNode_AddChild() that I expect will cure 
your particular woes here.

Andrew, PyMalloc was designed for oodles of small 
allocations.  Feel encouraged to write a patch to change the 
compiler to use PyObject_{Malloc, Realloc, Free} instead.  
Then it will automatically exploit PyMalloc when the latter is 
enabled.

Note that the regression test suite incorporates random 
numbers in several tests, and in ways that can affect 
runtime.  Small differences in aggregate test suite runtime 
are meaningless because of this.

----------------------------------------------------------------------

Comment By: Jack Jansen (jackjansen)
Date: 2002-07-08 07:24

Message:
Logged In: YES 
user_id=45365

Unfortunately on the Mac it doesn't help anything for the test_longexp problem, nor for the similar test_import problem.

The problem with MacPython's malloc seems to be that large reallocs cause the slowdown. And the addchild() calls will continually realloc a block of memory to a slightly larger size (I gave up when it was about 800KB, after a minute or two, and growing at tens of KB per second). As soon as the block is larger than SMALL_REQUEST_TRESHOLD pymalloc will simply call the underlying system malloc/realloc.

----------------------------------------------------------------------

Comment By: Andrew I MacIntyre (aimacintyre)
Date: 2002-07-07 16:41

Message:
Logged In: YES 
user_id=250749

Oops.  On FreeBSD,  test_longexp contributes 15% of the
performance gain (not 25%) observed for the regression test
with the patch applied.

Also, I would expect to make this a platform specific change
if its integrated, rather than a general change (unless that
it is seen as more appropriate).

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=578297&group_id=5470