[Python-Dev] PyBench DictCreation (was Re: Performance compares) (original) (raw)

Tim Peters tim@digicool.com
Fri, 18 May 2001 03:17:07 -0400

Previous message: [Python-Dev] PyBench DictCreation (was Re: Performance compares)
Next message: [Python-Dev] PyBench DictCreation (was Re: Performance compares)
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

[Jeremy]

I also did a profile run on CreateInstances, which has a difference of +55.54% on my machine. It's basically the same story. The instance dictionary is getting resized more often with Python 2.1+ than it did with Python 1.5.2. I wouldn't be surprised if several more tests are showing a slowdown with the same cause.

So boosting the minimum size sounds like a good thing.

I don't know. PyBench is great for showing that something changed, but it's got even less claim to "typical use" than pystone.

I don't know that the test suite is better in that respect, but it's got much more variety and everyone has it . I stuffed code in dict_dealloc() to record the ma_fill of each dict on its way to the grave (ma_fill == number of non-virgin slots). Across the test suite, here's the ranking, from most to least popular fill:

count fill %total cumulative %

146321 1 53.30 53.30 38200 0 13.91 67.21 32616 2 11.88 79.09 29648 3 10.80 89.89 9884 5 3.60 93.49 5423 4 1.98 95.47 2428 6 0.88 96.35 2016 8 0.73 97.08 1179 7 0.43 97.51 904 9 0.33 97.84 709 103 0.26 98.10 554 10 0.20 98.30 513 13 0.19 98.49 459 12 0.17 98.66 447 11 0.16 98.82 364 14 0.13 98.95 233 15 0.08 99.04 231 16 0.08 99.12 193 18 0.07 99.19 180 17 0.07 99.26 122 19 0.04 99.30 107 30 0.04 99.34 105 21 0.04 99.38 93 22 0.03 99.41 93 20 0.03 99.45 86 256 0.03 99.48 82 23 0.03 99.51 80 26 0.03 99.54 74 24 0.03 99.56 69 27 0.03 99.59 64 25 0.02 99.61 60 29 0.02 99.63 49 28 0.02 99.65 44 34 0.02 99.67 33 32 0.01 99.68 28 31 0.01 99.69 27 37 0.01 99.70 27 33 0.01 99.71 26 35 0.01 99.72 24 36 0.01 99.73 23 39 0.01 99.74 23 38 0.01 99.75 21 128 0.01 99.75 19 44 0.01 99.76 19 40 0.01 99.77 17 46 0.01 99.77 16 48 0.01 99.78 15 47 0.01 99.78 14 50 0.01 99.79 14 42 0.01 99.79

There are many more sizes, but I cut off the display here when they got too rare to round to 1% of 1% of the total count.

Boosting the first non-empty size to 8 would allow 93+% of all dicts to get away with at most one resize (a dict of size 8 is enough for a fill of 5, but not 6). OTOH, the current first non-empty size of 4 is enough for 79% of all dicts (enough for a fill of 2, but not 3). If oodles of those tiny dicts are alive at the same time, it would be quite a waste of space to force the non-empty ones to carry 8 slots. OTOH, if those small dicts are due to things like building one- or two-element keyword argument dicts, their lifetimes rarely overlap.

A more aggressive idea is to allow denser dicts, by allowing them to become no more than 75% full. That is, change the resize test from

mp->ma_fill*3 >= mp->ma_size*2

mp->ma_fill*4 > mp->ma_size*3

That would allow the 10.8% of real(er) life dicts with fill 3 to continue living in dicts with 4 slots, and allow about 90% of all dicts to get away with no more than one resize. The downside is that boosting the max load factor from 2/3 to 3/4 yields, "in theory", and for a dict hugging the limit, a small boost in the expected # of compares. But the "theory" is for random hash functions with "uniform probing" (tech term that does not mean linear probing), and Python's hash functions often aren't random at all, while AFAIK no rigorous analysis of its probing strategy exists.

So, plenty of arbitrary data there upon which to flip a coin .

Previous message: [Python-Dev] PyBench DictCreation (was Re: Performance compares)
Next message: [Python-Dev] PyBench DictCreation (was Re: Performance compares)
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]