[Python-Dev] Hash collision security issue (now public) (original) (raw)

Stephen J. Turnbull stephen at xemacs.org
Sat Dec 31 13:03:22 CET 2011


Victor Stinner writes:

Let's try to summarize this "vulnerability".

The creation of a Python dictionary has a complexity of O(n) in most cases, but O(n^2) in the worst case. The attack tries to go into the worst case. It requires to compute a set of N keys having the same hash value (hash(key1) == hash(key2) == ... hash(keyN)). It only has to compute these keys once. It looks like it is now cheap enough in practice to compute this dataset for Python (and other languages).

I don't know the implementation issues well enough to claim it is a solution, but this hasn't been mentioned before AFAICS:

While the dictionary probe has to start with a hash for backward compatibility reasons, is there a reason the overflow strategy for insertion has to be buckets containing lists? How about double-hashing, etc?



More information about the Python-Dev mailing list