Message 151744 - Python tracker (original) (raw)

Well, the old attempt was hardly robust :)

Can anyone see any vulnerabilities in this approach?

Yeah; I was mostly trying to add raw data (to help me debug the implementation).

I wonder if the dict statistics should be exposed with extra attributes or a method on the dict; e.g. a stats attribute, something like this:

LargeDictStats(keys=58, mask=127, insertions=53, iterations=1697)

SmallDictStats(keys=3, mask=7)

or somesuch. Though that's a detail, I think.

Caveats:

no overflow handling (what happens after 2**32 modifications to a long-lived dict on a 32-bit build?) - though that's fixable.

How do you suggest to fix it?

If the dict is heading towards overflow of these counters, it's either long-lived, or huge.

Possible approaches: (a) use 64-bit counters rather than 32-bit, though that's simply delaying the inevitable (b) when one of the counters gets large, divide both of them by a constant (e.g. 2). We're interested in their ratio, so dividing both by a constant preserves this.

By "a constant" do you mean from the perspective of big-O notation, or do you mean that it should be hardcoded (I was wondering if it should be a sys variable/environment variable etc?).

We're interested in the actual slowdown factor, which a constant factor models adequately. It's the slowdown factor which makes a DOS attack using this technique efficient. Whether or not dict construction truely degenerates into a O(n**2) operation is less relevant.

OK.

There needs to be a way to disable it: an environment variable would be the minimum IMO.

e.g. set it to 0 to enable it, set it to nonzero to set the scale factor. Any idea what to call it?

PYTHONALGORITHMICCOMPLEXITYTHRESHOLD=0 would be quite a mouthful.

BTW, presumably if we do it, we should do it for sets as well?