[Python-Dev] Basic pymalloc stats

Tim Peters tim.one@comcast.net
Fri, 05 Apr 2002 01:17:13 -0500


FYI, I implemented the optimizations Vladimir and I discussed here.

Next, _PyMalloc_DebugDumpStats() is an entry point you can call in a debug
build (or when PYMALLOC_DEBUG is enabled in a release build) to get a
snapshot of pymalloc's internal structures.  Perhaps it should be enabled in
a release build too without PYMALLOC_DEBUG -- as is, *because*
PYMALLOC_DEBUG is enabled, every allocation is bumped by 16 bytes to make
room for PYMALLOC_DEBUG's memory decorations.

Here's sample output (recently greatly improved), from near the tail end of
a debug-build run of the test suite:

Small block threshold = 256, in 32 size classes.
pymalloc malloc+realloc called 4414692 times.

class   num bytes   num pools   blocks in use  avail blocks
-----   ---------   ---------   -------------  ------------
    5          48         773           64932             0
    6          56         266           19028           124
    7          64         288           18122            22
    8          72         124            6914            30
    9          80         178            8873            27
   10          88          41            1867            19
   11          96          28            1170             6
   12         104          21             798            21
   13         112          16             543            33
   14         120          11             359             4
   15         128           8             228            20
   16         136           5             141             4
   17         144           5             114            26
   18         152          13             295            43
   19         160           6             144             6
   20         168         138            3292            20
   21         176           5              96            19
   22         184           4              76            12
   23         192           3              43            20
   24         200           3              42            18
   25         208           3              40            17
   26         216           3              43            11
   27         224           2              29             7
   28         232           3              32            19
   29         240           2              21            11
   30         248           2              31             1
   31         256           2              21             9

31 arenas * 262144 bytes/arena          =         8126464

0 unused pools * 4096 bytes             =               0
# bytes in allocated blocks             =         7796144
# bytes in available blocks             =           69056
# bytes lost to pool headers            =           62496
# bytes lost to quantization            =           71792
# bytes lost to arena alignment         =          126976
Total                                   =         8126464

Running the Unicode tests vastly increases the number of the smallest blocks
in use.  The hump in the 168-byte class is due to small dicts.

Feel lightly encouraged to try calling this in your real programs now, and
strongly encouraged after the memory-API rework is complete.

Try very hard not to read too much into the test suite <wink>.  All I take
from the above is that memory utilization is excellent; fragmentation is
trivial (e.g., in the 56-byte class, 124 available blocks * 56 bytes/block
is greater than a 4096-byte pool, so in an ideal world we *could* get away
with 265 pools of this size instead of 266); and the wastage due to tossing
away "the ends" of arenas to leave pool-aligned pools ("arena alignment") is
significant (compared to the other kinds of pure waste in pymalloc --
"quantization" means stuff lost to that the available bytes in a pool often
aren't an exact multiple of the pool's block size), but that overall wastage
is low.  Note that there's no accounting here for what's lost due to
returning 8-byte aligned addresses.