[Python-Dev] RE: [Zope.Com Geeks] Re: Program very slow to finish

Tim Peters tim.one@home.com
Tue, 6 Nov 2001 14:38:11 -0500


[Just van Rossum]

> ...
> 266 Mhz G3 PPC, 160 Megs of RAM, OSX 10.0.4, CVS Python:
>
> without pymalloc:
> ...
> size:  600000, creation:  21.09, destruction:   3.54
> size:  800000, creation:  24.75, destruction:  18.47
> size: 1000000, creation: 119.68, destruction: 435.15
>
> with pymalloc:
> ...
> size:  600000, creation:  20.35, destruction:   0.71
> size:  800000, creation:  21.44, destruction:   0.80
> size: 1000000, creation:  20.80, destruction:   0.76

Looks like you ran out of RAM at the end there, when using the system
malloc.  PyMalloc has low memory overhead per object allocated so long as
small blocks are requested.  Since Roeland's test uses 7-character string
keys, most requests should be for 28-byte string-object chunks:

    4 type pointer
    4 refcount
    4 character count
    4 cached hash code
    4 interned string pointer
    7 characters
    1 trailing 0 byte
   --
   28

It will round that up to 32, but that's essentially all the waste.  The
system malloc likely adds at least enough more to store the size of the
allocated block too (at free() time, PyMalloc infers the size from the
memory address).

Curious:  the PyMalloc comments (obmalloc.c) say requests through 256 bytes
are handled internally.  But as I read the *code*, SMALL_REQUEST_THRESHOLD
is actually 64 on 32-bit boxes, and 96 or 128 on 64-bit boxes:

#define ALIGNMENT		8
#define _PYOBJECT_THRESHOLD	((SIZEOF_LONG + SIZEOF_VOID_P) * ALIGNMENT)
#define SMALL_REQUEST_THRESHOLD _PYOBJECT_THRESHOLD

This should probably be boosted!  Since Vladimir wrote this:

1. gc-able objects grew 12 bytes of gc overhead.

2. The smallest dict possible now has 8 slots embedded in the dict object
   (so consumes at least 8*12 == 96 bytes for that alone, so dict
   requests are probably never handled directly by PyMalloc anymore).

3. Type objects have grown.

4. The new __slots__ mechanism will likely become heavily used in
   memory-conscious code, and creates oodles of new possibilities
   for heavy allocation of a variety of "small block" sizes we didn't
   see often before.