[Python-Dev] Hash collision security issue (now public) (original) (raw)

Paul McMillan paul at mcmillan.ws
Mon Jan 2 06:55:52 CET 2012


I fixed a couple things in my proposed algorithm: https://gist.github.com/0a91e52efa74f61858b5

I had a typo, and used 21 instead of 12 for the size multiplier. We definitely don't need 2MB random data.

The initialization of r was broken. Now it is an array of ints, so there's no conversion when it's used. I've adjusted it so there's 8k of random data, broken into 2048 ints.

I added a length-based seed to the initial value of x. This prevents single-characters from being used to enumerate raw values from r. This is similar to the change proposed by Christian Heimes.

Most importantly, I moved the xor with r[x % len_r] down a line. Before, it wasn't being applied to the last character.

Christian Heimes said: We also need to special case short strings. The above routine hands over the seed to attackers, if he is able to retrieve lots of single character hashes.

The updated code always includes at least 2 lookups from r, which I believe solves the single-character enumeration problem. If we special-case part of our hash function for short strings, we may get suboptimal collisions between the two types of hashes.

I think Ruby uses FNV-1 with a salt, making it less vulnerable to this. FNV is otherwise similar to our existing hash function.

For the record, cryptographically strong hash functions are in the neighborhood of 400% slower than our existing hash function.

Terry Reedy said: I understood Alexander Klink and Julian Wälde, hashDoS at alech.de, as saying that they consider that using a random non-zero start value is sufficient to make the hash non-vulnerable.

I've been talking to them. They're happy to look at our proposed changes. They indicate that a non-zero start value is sufficient to prevent the attack, but D. J. Bernstein disagrees with them. He also has indicated a willingness to look at our solution.

-Paul



More information about the Python-Dev mailing list