[Python-Dev] Status of the fix for the hash collision vulnerability (original) (raw)

Steven D'Aprano steve at pearwood.info
Sat Jan 14 03:55:22 CET 2012


On 14/01/12 12:58, Gregory P. Smith wrote:

I do like randomly seeding the hash. +1. This is easy. It can easily be back ported to any Python version.

It is perfectly okay to break existing users who had anything depending on ordering of internal hash tables. Their code was already broken.

For the record:

steve at runes:$ python -c "print(hash('spam ham'))" -376510515 steve at runes:$ jython -c "print(hash('spam ham'))" 2054637885

So it is already the case that Python code that assumes stable hashing is broken.

For what it's worth, I'm not convinced that we should be overly-concerned by "poor saps" (Guido's words) who rely on accidents of implementation regarding hash. We shouldn't break their code unless we have a good reason, but this strikes me as a good reason. The documentation for hash certainly makes no promise about stability, and relying on it strikes me as about as sensible as relying on the stability of error messages.

I'm also not convinced that the option to raise an exception after 1000 collisions actually solves the problem. That relies on the application being re-written to catch the exception and recover from it (how?). Otherwise, all it does is change the attack vector from "cause an indefinite number of hash collisions" to "cause 999 hash collisions followed by crashing the application with an exception", which doesn't strike me as much of an improvement.

+1 on random seeding. Default to on in 3.3+ and default to off in older versions, which allows people to avoid breaking their code until they're ready for it to be broken.

-- Steven



More information about the Python-Dev mailing list