[Python-Dev] Documentation Error for hash (original) (raw)

Matt Giuca matt.giuca at gmail.com
Fri Aug 29 13:03:22 CEST 2008


Being hashable is a different from being usable as dictionary key.

Dictionaries perform the lookup based on the hash value, but will then have to check for hash collisions based on an equal comparison. If an object does not define an equal comparison, then it is not usable as dictionary key.

But if an object defines neither eq or hash, then by default it is usable as a dictionary key (using the id() of the object for both default equality and hashing, which is fine, and works for all user-defined types by default).

An object defining hash but not eq is not problematic, since it still uses id() default for equality. (It just has potentially bad dictionary performance, if lots of things hash the same even though they aren't equal). This it not a problem by definition because it is officially "okay" for two objects to hash the same, even if they aren't equal, though undesirable.

So all hashable objects are usable as dictionary keys, are they not? (As far as I know it isn't possible to have an object that does not have an equality comparison, unless you explicitly override eq and have it raise a TypeError for some reason).

It's probably a good idea to implement hash for objects that

implement comparisons, but it won't always work and it is certainly not needed, unless you intend to use them as dictionary keys.

But from what I know, it is a bad idea to implement hash for any mutable object with non-reference equality (where equality depends on the mutable state), as an unbreakable rule. This is because if they are inserted into a dictionary, then mutated, they may suddenly be in the wrong bucket. This is why all the mutable types in Python with non-reference equality (eg. list, set, dict) are explicitly not hashable, while the immutable types (eg. tuple, frozenset, str) are hashable, and so are the mutable types with reference equality (eg. functions, user-defined classes by default).

> and that mutable objects should raise a TypeError in hash. That's a good idea, even though it's not needed either ;-)

So I think my above "axioms" are a better (less restrictive, and still correct) rule than this one. It's OK for a mutable object to define hash, as long as its eq doesn't depend upon its mutable state. For example, you can insert a function object into a dictionary, and mutate its closure, and it won't matter, because neither the hash nor the equality of the object is changing. It's only types like list and dict, with deep equality, where you run into this hash table problem. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/python-dev/attachments/20080829/10d14a53/attachment.htm>



More information about the Python-Dev mailing list