Issue 17563: Excessive resizing of dicts when used as a cache (original) (raw)

If a dict is used a cache, e.g. in functools.lru_cache, the reduced resize factor in 3.3 can cause excessive resizing. This can lead to a significant performance regression.

When the the number of deletions and insertions is roughly in balance the reduced head room in the dict (compare to 3.2) causes a large increase in the number of resizes.

The reason for this above-linear increase is that with fewer dummy keys, the chance of a dummy being overwritten is reduced and is there is less overhead as well. A dictionary with 128 items will have a capacity of 256 and only 43 dummy keys. If it had a capacity of 512 (as it would have done in 3.2) then it will have 214 keys, making a resize at least 10 times less frequent.

Changing the growth function from round_up_to_power_2(used2) to round_up_to_power_2(used2+capacity/2) the desirable property of only doubling in size when growing can be preserved, yet ensuring sufficient overhead when used as a cache.

Consider a dict which grows to n items and then remains that size, with frequent deletions and insertions, using the proposed growth function:

Items Capacity Steady state Capacity on reaching n capacity under 3.2 2 8 8 8 4 8 16 16 6 16 32 32 8 16 32 32 10 16 64 64 12 32 64 64 15 32 64 64 20 32 128 128 30 64 128 128 50 128 256 256 80 128 512 512 128 256 512 512

Thanks to Raymond Hettinger for bringing this to my attention.

Patch attached.