[Python-3000] Performance Notes - new hash algorithm (original) (raw)

Nicko van Someren nicko at nicko.org
Thu Sep 13 01:33:07 CEST 2007


On 10 Sep 2007, at 01:58, Jim Jewett wrote:

To spell this out a bit more: ... When adding four entries to an 8-slot table, a truly random hash would have at least one collision (0/8 + 1/8 + 2/8 + 3/8 =) 3/4 of the time. As expected, the proposed hash does have a collision for those four values (the first and fourth).

While your over-all analysis is both informative and helpful, the
pedant in me feels obliged to point out the flaw in your math. The
probability of at least one collision is 1 minus the probability of
no collision, which is in turn 8/8 * 7/8 * 6/8 * 5/8, so the correct
figure is actually that you collide about 59% of the time, not 75%.

(If your math were correct then 5 items would collide 125% of the
time, which is clearly wrong! :-)

Cheers,
    Nicko


More information about the Python-3000 mailing list