[Python-Dev] Basic pymalloc stats
Tim Peters
tim.one@comcast.net
Fri, 05 Apr 2002 01:17:13 -0500
FYI, I implemented the optimizations Vladimir and I discussed here.
Next, _PyMalloc_DebugDumpStats() is an entry point you can call in a debug
build (or when PYMALLOC_DEBUG is enabled in a release build) to get a
snapshot of pymalloc's internal structures. Perhaps it should be enabled in
a release build too without PYMALLOC_DEBUG -- as is, *because*
PYMALLOC_DEBUG is enabled, every allocation is bumped by 16 bytes to make
room for PYMALLOC_DEBUG's memory decorations.
Here's sample output (recently greatly improved), from near the tail end of
a debug-build run of the test suite:
Small block threshold = 256, in 32 size classes.
pymalloc malloc+realloc called 4414692 times.
class num bytes num pools blocks in use avail blocks
----- --------- --------- ------------- ------------
5 48 773 64932 0
6 56 266 19028 124
7 64 288 18122 22
8 72 124 6914 30
9 80 178 8873 27
10 88 41 1867 19
11 96 28 1170 6
12 104 21 798 21
13 112 16 543 33
14 120 11 359 4
15 128 8 228 20
16 136 5 141 4
17 144 5 114 26
18 152 13 295 43
19 160 6 144 6
20 168 138 3292 20
21 176 5 96 19
22 184 4 76 12
23 192 3 43 20
24 200 3 42 18
25 208 3 40 17
26 216 3 43 11
27 224 2 29 7
28 232 3 32 19
29 240 2 21 11
30 248 2 31 1
31 256 2 21 9
31 arenas * 262144 bytes/arena = 8126464
0 unused pools * 4096 bytes = 0
# bytes in allocated blocks = 7796144
# bytes in available blocks = 69056
# bytes lost to pool headers = 62496
# bytes lost to quantization = 71792
# bytes lost to arena alignment = 126976
Total = 8126464
Running the Unicode tests vastly increases the number of the smallest blocks
in use. The hump in the 168-byte class is due to small dicts.
Feel lightly encouraged to try calling this in your real programs now, and
strongly encouraged after the memory-API rework is complete.
Try very hard not to read too much into the test suite <wink>. All I take
from the above is that memory utilization is excellent; fragmentation is
trivial (e.g., in the 56-byte class, 124 available blocks * 56 bytes/block
is greater than a 4096-byte pool, so in an ideal world we *could* get away
with 265 pools of this size instead of 266); and the wastage due to tossing
away "the ends" of arenas to leave pool-aligned pools ("arena alignment") is
significant (compared to the other kinds of pure waste in pymalloc --
"quantization" means stuff lost to that the available bytes in a pool often
aren't an exact multiple of the pool's block size), but that overall wastage
is low. Note that there's no accounting here for what's lost due to
returning 8-byte aligned addresses.