[Python-Dev] http://mail.python.org/pipermail/python-dev/2011-December/115172.html (original) (raw)
Jim Jewett jimjjewett at gmail.com
Mon Jan 2 01:37:26 CET 2012
- Previous message: [Python-Dev] Hash collision security issue (now public)
- Next message: [Python-Dev] http://mail.python.org/pipermail/python-dev/2011-December/115172.html
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
In http://mail.python.org/pipermail/python-dev/2011-December/115172.html, P. J. Eby wrote:
On Sat, Dec 31, 2011 at 7:03 AM, Stephen J. Turnbull wrote:
While the dictionary probe has to start with a hash for backward compatibility reasons, is there a reason the overflow strategy for insertion has to be buckets containing lists? How about double-hashing, etc?
This won't help, because the keys still have the same hash value. ANYTHING you do to them after they're generated will result in them still colliding.
The only thing that works is to change the hash function in such a way that the strings end up with different hashes in the first place. Otherwise, you'll still end up with (deliberate) collisions.
Well, there is nothing wrong with switching to a different hash function after N collisions, rather than "in the first place". The perturbation effectively does by shoving the high-order bits through the part of the hash that survives the mask.
(Well, technically, you could use trees or some other O log n data structure as a fallback once you have too many collisions, for some value of "too many". Seems a bit wasteful for the purpose, though.)
Your WSGI specification < http://www.python.org/dev/peps/pep-0333/ > requires using a real dictionary for compatibility; storing some of the values outside the values array would violate that. Do you consider that obsolete?
-jJ
- Previous message: [Python-Dev] Hash collision security issue (now public)
- Next message: [Python-Dev] http://mail.python.org/pipermail/python-dev/2011-December/115172.html
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]