Message 152125 - Python tracker (original) (raw)

On Fri, 2012-01-27 at 21:02 +0000, Martin v. Löwis wrote:

Martin v. Löwis <martin@v.loewis.de> added the comment:

But then isn't it vulnerable to Frank's first attack as exposed in http://mail.python.org/pipermail/python-dev/2012-January/115726.html ?

It would be, yes. That's sad.

That could be fixed by indeed creating trees in all cases (i.e. moving away from open addressing altogether). The memory consumption does not worry me here; however, dictionary order will change in more cases.

Compatibility could be restored by introducing a threshold for tree creation: if insertion visits more than N slots, go back to the original slot and put a tree there. I'd expect that N could be small, e.g. N==4. Lookup would then have to consider all AVL trees along the chain of visited slots, but ISTM it could also stop after visiting N slots.

Perhaps we could combine my attack-detection code from http://bugs.python.org/issue13703#msg151714 with Martin's AVL approach? Use the ma_smalltable to track stats, and when a dict detects that it's under attack, if all the keys are AVL-compatible, it could transition to full-AVL mode. [I believe that my patch successfully handles both of Frank's attacks, but I don't have the test data - I'd be very grateful to receive a copy (securely)].

[See hybrid-approach-dmalcolm-2012-01-25-002.patch for the latest version of attack-detection; I'm working on a rewrite in which I restrict it to working just on pure-str dicts. With that idea, when a dict detects that it's under attack, if all the keys satisfy this condition (new_hash(keyA) == new_hash(keyB)) iff (hash(keyA) == hash(keyB)) then all hash values get recalculated using new_hash (which is randomized), which should offer protection in many common attack scenarios, without the semantic change Alex and Antoine indicated]