[Python-Dev] Counting collisions for the win (original) (raw)

Jim J. Jewett jimjjewett at gmail.com
Thu Feb 16 23:10:51 CET 2012


In http://mail.python.org/pipermail/python-dev/2012-January/115715.html Frank Sievertsen wrote:

Am 20.01.2012 13:08, schrieb Victor Stinner:

I'm surprised we haven't seen bug reports about it from users of 64-bit Pythons long ago A Python dictionary only uses the lower bits of a hash value. If your dictionary has less than 2**32 items, the dictionary order is exactly the same on 32 and 64 bits system: hash32(str)& mask == hash64(str)& mask for mask<= 2**32-1.

No, that's not true. Whenever a collision happens, other bits are mixed in very fast.

Frank

Bits are mixed in quickly from a denial-of-service standpoint, but Victor is correct from a "Why don't the tests already fail?" standpoint.

A dict with 2**12 slots, holding over 2700 entries, will be far larger than most test cases -- particularly those with visible output. In a dict that size, 32-bit and 64-bit machines will still probe the same first, second, third, fourth, fifth, and sixth slots. Even on the rare cases when there are at least 6 collisions, the next slots may well be either the same, or close enough that it doesn't show up in a changed iteration order.

-jJ

--

If there are still threading problems with my replies, please email me with details, so that I can try to resolve them. -jJ



More information about the Python-Dev mailing list