[Python-Dev] Status of the fix for the hash collision vulnerability (original) (raw)

Victor Stinner victor.stinner at haypocalc.com
Tue Jan 17 12:55:02 CET 2012

Previous message: [Python-Dev] Status of the fix for the hash collision vulnerability
Next message: [Python-Dev] Status of the fix for the hash collision vulnerability
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

I thought that the original problem was that with N insertions in the dictionary, by repeatedly inserting different keys generating the same hash value an attacker could arrange that the cost of finding an open slot is O(N), and thus the cost of N insertions is O(N^2).

If so, frequent resizing could make the attacker's problem much more difficult, as the distribution of secondary probes should change with each resize.

The attack creates 60,000 strings (or more) with exactly the same hash value. A dictionary uses hash(str) & DICT_MASK to compute the bucket index, where DICT_HASH is the number of buckets minus one. If all strings have the same hash value, we always start in the same bucket and the key has to be compared to all previous strings to find the next empty bucket. The attack works because a LOT of strings are compared and comparing strings is slow.

If hash(str1)&DICT_MASK == hash(str2)&DICT_MASK but hash(str1)!=hash(str2), strings are not compared (this is a common optimization in Python), and the so the attack would not be successful (it would be slow, but not as slow as comparing two strings).

Victor

Previous message: [Python-Dev] Status of the fix for the hash collision vulnerability
Next message: [Python-Dev] Status of the fix for the hash collision vulnerability
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the Python-Dev mailing list