[Python-Dev] Hash collision security issue (now public) (original) (raw)

Christian Heimes lists at cheimes.de
Sat Dec 31 04:24:15 CET 2011


Am 31.12.2011 03:31, schrieb Victor Stinner:

Le 29/12/2011 14:19, Christian Heimes a écrit :

Perhaps the dict code is a better place for randomization. The problem is the creation of a dict with keys all having the same hash value. The current implementation of dict uses a linked-list. Adding a new item requires to compare the new key to all existing keys (compare the value, not the hash, which is much slower). We had to change completly how dict is implemented to be able to fix this issue. I don't think that we can change the dict implementation without breaking backward compatibility or breaking applications. Change the implementation would change dict properties, and applications rely on the properties of the current implementation. Tell me if I am wrong.

You are right and I was wrong. We can't do the randomization inside the dict code. The randomization factor must used as initialization factor (IV) for the hashing algorithm. At first I thought the attack used the unique property of Python's dict implementation (perturbed hash instead of buckets for equal hashes) but I was wrong. It took me several hours of reading and digging into the math until I figured out my mistake. Sorry! :)

This means we can't implement a salted dict unless the salted dict re-implemention the hash algorithm for unicode, bytes and memoryview. I doubt that a wise idea.

I've checked my first draft of a possible solution: http://hg.python.org/features/randomhash/ . The pseudo RNG has to be replaced with something better and it's missing an option to feed a seed, too.

Christian



More information about the Python-Dev mailing list