[Python-Dev] Basic pymalloc stats (original) (raw)

Tim Peters tim.one@comcast.net
Fri, 05 Apr 2002 01:17:13 -0500


FYI, I implemented the optimizations Vladimir and I discussed here.

Next, _PyMalloc_DebugDumpStats() is an entry point you can call in a debug build (or when PYMALLOC_DEBUG is enabled in a release build) to get a snapshot of pymalloc's internal structures. Perhaps it should be enabled in a release build too without PYMALLOC_DEBUG -- as is, because PYMALLOC_DEBUG is enabled, every allocation is bumped by 16 bytes to make room for PYMALLOC_DEBUG's memory decorations.

Here's sample output (recently greatly improved), from near the tail end of a debug-build run of the test suite:

Small block threshold = 256, in 32 size classes. pymalloc malloc+realloc called 4414692 times.

class num bytes num pools blocks in use avail blocks


5          48         773           64932             0
6          56         266           19028           124
7          64         288           18122            22
8          72         124            6914            30
9          80         178            8873            27

10 88 41 1867 19 11 96 28 1170 6 12 104 21 798 21 13 112 16 543 33 14 120 11 359 4 15 128 8 228 20 16 136 5 141 4 17 144 5 114 26 18 152 13 295 43 19 160 6 144 6 20 168 138 3292 20 21 176 5 96 19 22 184 4 76 12 23 192 3 43 20 24 200 3 42 18 25 208 3 40 17 26 216 3 43 11 27 224 2 29 7 28 232 3 32 19 29 240 2 21 11 30 248 2 31 1 31 256 2 21 9

31 arenas * 262144 bytes/arena = 8126464

0 unused pools * 4096 bytes = 0

bytes in allocated blocks = 7796144

bytes in available blocks = 69056

bytes lost to pool headers = 62496

bytes lost to quantization = 71792

bytes lost to arena alignment = 126976

Total = 8126464

Running the Unicode tests vastly increases the number of the smallest blocks in use. The hump in the 168-byte class is due to small dicts.

Feel lightly encouraged to try calling this in your real programs now, and strongly encouraged after the memory-API rework is complete.

Try very hard not to read too much into the test suite . All I take from the above is that memory utilization is excellent; fragmentation is trivial (e.g., in the 56-byte class, 124 available blocks * 56 bytes/block is greater than a 4096-byte pool, so in an ideal world we could get away with 265 pools of this size instead of 266); and the wastage due to tossing away "the ends" of arenas to leave pool-aligned pools ("arena alignment") is significant (compared to the other kinds of pure waste in pymalloc -- "quantization" means stuff lost to that the available bytes in a pool often aren't an exact multiple of the pool's block size), but that overall wastage is low. Note that there's no accounting here for what's lost due to returning 8-byte aligned addresses.