(original) (raw)
On Sun, Jan 1, 2012 at 7:37 PM, Jim Jewett <jimjjewett@gmail.com> wrote:
Well, there is nothing wrong with switching to a different hash function after N
collisions, rather than "in the first place". �The perturbation
effectively does by
shoving the high-order bits through the part of the hash that survives the mask.
Since these are true hash collisions, they will all have the same high order bits. �So, the usefulness of the perturbation is limited mainly to the common case where true collisions are rare.
> (Well, technically, you could use trees or some other O log n data
> structure as a fallback once you have too many collisions, for some value
> of "too many". �Seems a bit wasteful for the purpose, though.)
Your WSGI specification < http://www.python.org/dev/peps/pep-0333/ > requires
using a real dictionary for compatibility; storing some of the values
outside the
values array would violate that. �
When I said "use some other data structure", I was referring to the internal implementation of the dict type, not to user code. �The only user-visible difference (even at C API level) would be the order of keys() et al. �(In any case, I still assume this is too costly an implementation change compared to changing the hash function or seeding it.