[Python-Dev] RE: Program very slow to finish

Tim Peters tim.one@home.com
Mon, 5 Nov 2001 20:56:15 -0500


Speed freaks should look up this thread on c.l.py; also a related SF bug
report I recently closed as "Won't Fix".

Roeland Rengelink set up a simple test that builds increasingly large dicts,
timing the per-item creation and destruction times.  This was in response to
another poster who bumped into the "geez, my program seems to take *forever*
to exit" annoyance, where final decref'ing of many objects *can* take hours
to complete (normally people only notice this at program exit, but it can
happen whenever a large number of objects get freed).

Roeland found the creation time per dict element on his Linux system was
pretty steady, but destruction time per element grew disturbingly with dict
size.  I found the same on Win98SE, but the degeneration in destruction time
per element was milder than on his Linux test.

I fiddled dict deallocation on my box to do everything *except* call free()
when a refcount hit zero (the dict contained only string objects, so free()
was the only thing left out -- strings have a trivial destructor).  So the
memory leaked, but per-element destruction time no longer increased with
dict size, i.e. "the problem" on my box was entirely due to MS free()
behavior.

I suggested to Roeland that he try rebuilding his Python with PyMalloc
enabled, just to see what would happen.  This is what happened (time is
average microseconds per dict entry, as computed from time.time() deltas
captured across whole-dict operations):

> Well, aint that nice
>
> 2.2b1 --with-pymaloc
>
> size:   10000, creation: 29.94, destruction:  0.61
> size:   20000, creation: 30.10, destruction:  0.64
> size:   50000, creation: 30.73, destruction:  0.71
> size:  100000, creation: 30.72, destruction:  0.68
> size:  200000, creation: 30.95, destruction:  0.69
> size:  500000, creation: 30.62, destruction:  0.67
> size: 1000000, creation: 30.71, destruction:  0.68
>
> malloc is faster too ;)

This is what he saw earlier, using his platform malloc/free:

> All times in micro-seconds per item. For the code see end of this post.
> (Linux 2.2.14, 128M RAM, Cel 333 MHz)
>
> size:   10000, creation: 31.00, destruction:  1.49
> size:   20000, creation: 31.10, destruction:  1.57
> size:   50000, creation: 32.77, destruction:  1.76
> size:  100000, creation: 32.00, destruction:  1.92
> size:  200000, creation: 32.59, destruction:  2.38
> size:  500000, creation: 32.12, destruction:  4.35
> size: 1000000, creation: 32.25, destruction: 10.47

Can any Python-Dev'er make time to dig into the advisability of making
PyMalloc the default?  I only took time for this because I'm out sick today,
and was looking for something mindless to occupy my fevered thoughts; alas,
it paid off <wink>.  I recall there are still thread issues wrt PyMalloc,
and there *were* some reports that PyMalloc was slower on some platforms.
Against that, I'm the guy who usually gets stuck trying to explain the
inexplicable, and malloc/free performance are so critical to Python
performance that it's always been "a problem" that we have no idea how
system malloc/free behave across platforms (although I suppose it's "a
feature" that I know how to crash Win9X by provoking problems with its
malloc <wink>).  I can't make time for it in the 2.2 timeframe, though.

factors-of-2-to-15-are-worth-a-little-effort-if-they're-real-ly
    y'rs  - tim