[Python-Dev] Status of the fix for the hash collision vulnerability (original) (raw)

Guido van Rossum guido at python.org
Fri Jan 13 22:22:32 CET 2012


On Fri, Jan 13, 2012 at 10:13 AM, Mark Dickinson <dickinsm at gmail.com> wrote:

On Fri, Jan 13, 2012 at 5:43 PM, Guido van Rossum <guido at python.org> wrote: >> How pathological do you consider the set >> >> {1 << n for n in range(2000)}_ _>> >> to be? What about the set: >> >> ieee754powersoftwo = {2.0**n for n in range(-1074, 1024)} >> >> ? The > 2000 elements of the latter set have only 61 distinct hash >> values on 64-bit machine, so there will be over 2000 total collisions >> involved in creating this set (though admittedly only around 30 >> collisions per hash value). > > Hm... So how does the collision counting work for this case?

Ah, my bad. It looks like the ieee754powersoftwo is safe---IIUC, it's the number of collisions involved in a single key-set operation that's limited. So a dictionary with keys {1<<n for n in range(2000)}_ _is fine, but a dictionary with keys {1<<(61*n) for n in range(2000)}_ _is not:_ _>>> {1<<(n*61):True for n in range(2000)}_ _Traceback (most recent call last):_ _File "", line 1, in File "", line 1, in KeyError: 'too many hash collisions' [67961 refs] I'd still not consider this particularly pathological, though.

Really? Even though you came up with specifically to prove me wrong?

-- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/python-dev/attachments/20120113/5dea04fe/attachment.html>



More information about the Python-Dev mailing list