On Fri, Jan 13, 2012 at 2:57 AM, Guido van Rossum <guido@python.org> wrote:
> How
> pathological the data needs to be before the collision counter triggers? I'd
> expect *very* pathological.

How pathological do you consider the set

 � {1 << n for n in range(2000)}

to be? �What about the set:

 � ieee754_powers_of_two = {2.0**n for n in range(-1074, 1024)}

? �The > 2000 elements of the latter set have only 61 distinct hash
values on 64-bit machine, so there will be over 2000 total collisions
involved in creating this set (though admittedly only around 30
collisions per hash value).

Hm... So how does the collision counting work for this case?
-- 
--Guido van Rossum (python.org/~guido)
">

(original) (raw)

On Fri, Jan 13, 2012 at 9:08 AM, Mark Dickinson <dickinsm@gmail.com> wrote:
On Fri, Jan 13, 2012 at 2:57 AM, Guido van Rossum <guido@python.org> wrote:
> How
> pathological the data needs to be before the collision counter triggers? I'd
> expect \*very\* pathological.

How pathological do you consider the set

� {1 << n for n in range(2000)}

to be? �What about the set:

� ieee754\_powers\_of\_two = {2.0\*\*n for n in range(-1074, 1024)}

? �The > 2000 elements of the latter set have only 61 distinct hash
values on 64-bit machine, so there will be over 2000 total collisions
involved in creating this set (though admittedly only around 30
collisions per hash value).

Hm... So how does the collision counting work for this case?

--
--Guido van Rossum (python.org/\~guido)