[Python-Dev] Hash collision security issue (now public) (original) (raw)

Christian Heimes lists at cheimes.de
Thu Dec 29 14:32:21 CET 2011


Am 29.12.2011 13:57, schrieb Armin Ronacher:

Hi,

Something I should add to this now that I thought about it a bit more: Assuming this should be fixed on a language level the solution would probably be to salt hashes. The most common hash to salt here is the PyUnicode hash for obvious reasons. - Option a: Compiled in Salt + Easy to implement - Breaks unittests most likely (those were broken in the first place but that's still a very annoying change to make) - Might cause problems with interoperability of Pythons compiled with different hash salts - You're not really solving the problem because each linux distribution (besides Gentoo I guess) would have just one salt compiled in and that would be popular enough to have the same issue. - Option b: Environment variable for the salt + Easy-ish to implement + Easy to synchronize over different machines - initialization for base types happens early and unpredictive which makes it hard for embedded Python interpreters (think modwsgi and other things) to specify the salt - Option c: Random salt at runtime + Easy to implement - impossible to synchronize - breaks unittests in the same way as a compiled in salt would do

IMHO we don't have to alter the outcome of hash("some string"), hash(1) and all other related types. We just need to reduce the change the an attacker can produce collisions in the dict (and set?) code that looks up the slot (PyDictEntry). How about adding the random value in Object/dictobject.c:lookdict() and lookdict_str() (Python 2.x) / lookdict_unicode() (Python 3.x)? With this approach the hash of all our objects stay the same and just the dict code needs to be altered. The approach has also the benefit that all possible objects gain a randomized hash.

Also related: since this is a security related issue, would this be something that goes into Python 2? Does that affect how a fix would look like?

IMHO it does affect the fix. A changed and randomized hash function may break software that relies on a stable hash() function.

Christian



More information about the Python-Dev mailing list