[Python-Dev] Status of the fix for the hash collision vulnerability (original) (raw)

Victor Stinner victor.stinner at haypocalc.com
Fri Jan 13 10:23:45 CET 2012


Unfortunately it requires only a few seconds to compute enough 32bit collisions on one core with no precomputed data.

Are you running the hash function "backward" to generate strings with the same value, or you are more trying something like brute forcing?

And how do you get the hash secret? You need it to run an attack.

In fact, since hash(X) == hash(Y) is independent of the suffix [ hash(X) ^ suffix == hash(Y) ^ suffix ], a lot of precomputation (from the tail) is possible.

My change adds also a prefix (a prefix and a suffix). I don't know if it changes anything for generating collisions.

So the question is: How difficult is it to guess the seed?

I wrote some remarks about that in the issue. For example:

(hash("\0")^1) ^ (hash("\0\0")^2) gives ((prefix * 1000003) & HASH_MASK) ^ ((prefix * 1000003**2) & HASH_MASK)

I suppose that you don't have directly the full output of hash(str) in practical, but hash(str) & DICT_MASK where DICT_MASK depends is the size of the internal dict array minus 1. For example, for a dictionary of 65,536 items, the mask is 0x1ffff and so cannot gives you more than 17 bits of hash(str) output. I still don't know how difficult it is to retreive hash(str) bits from repr(dict).

Victor



More information about the Python-Dev mailing list