[Python-Dev] Hash collision security issue (now public) (original) (raw)

PJ Eby pje at telecommunity.com
Sat Dec 31 19:04:28 CET 2011


On Sat, Dec 31, 2011 at 7:03 AM, Stephen J. Turnbull <stephen at xemacs.org>wrote:

While the dictionary probe has to start with a hash for backward compatibility reasons, is there a reason the overflow strategy for insertion has to be buckets containing lists? How about double-hashing, etc?

This won't help, because the keys still have the same hash value. ANYTHING you do to them after they're generated will result in them still colliding.

The only thing that works is to change the hash function in such a way that the strings end up with different hashes in the first place. Otherwise, you'll still end up with (deliberate) collisions.

(Well, technically, you could use trees or some other O log n data structure as a fallback once you have too many collisions, for some value of "too many". Seems a bit wasteful for the purpose, though.) -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/python-dev/attachments/20111231/72554d42/attachment.html>



More information about the Python-Dev mailing list