Message 151745 - Python tracker (original) (raw)
I wonder if the dict statistics should be exposed with extra attributes or a method on the dict; e.g. a stats attribute, something like this:
LargeDictStats(keys=58, mask=127, insertions=53, iterations=1697)
SmallDictStats(keys=3, mask=7)
Sounds a bit overkill, and it shouldn't be a public API (which methods are). Even a private API on dicts would quickly become visible, since dicts are so pervasive.
Caveats:
- no overflow handling (what happens after 2**32 modifications to a long-lived dict on a 32-bit build?) - though that's fixable.
How do you suggest to fix it?
If the dict is heading towards overflow of these counters, it's either long-lived, or huge.
Possible approaches: (a) use 64-bit counters rather than 32-bit, though that's simply delaying the inevitable
Well, even assuming one billion lookup probes per second on a single dictionary, the inevitable will happen in 584 years with a 64-bit counter (but only 4 seconds with a 32-bit counter).
A real issue, though, may be the cost of 64-bit arithmetic on 32-bit CPUs.
(b) when one of the counters gets large, divide both of them by a constant (e.g. 2). We're interested in their ratio, so dividing both by a constant preserves this.
Sounds good, although we may want to pull this outside of the critical loop.
By "a constant" do you mean from the perspective of big-O notation, or do you mean that it should be hardcoded (I was wondering if it should be a sys variable/environment variable etc?).
Hardcoded, as in your patch.
There needs to be a way to disable it: an environment variable would be the minimum IMO.
e.g. set it to 0 to enable it, set it to nonzero to set the scale factor.
0 to enable it sounds misleading. I'd say:
- 0 to disable it
- 1 to enable it and use the default scaling factor
= 2 to enable it and set the scaling factor
Any idea what to call it?
PYTHONDICTPROTECTION ? Most people should either enable or disable it, not change the scaling factor.
BTW, presumably if we do it, we should do it for sets as well?
Yeah, and use the same env var / sys function.