[Python-Dev] Documentation Error for hash (original) (raw)

M.-A. Lemburg mal at egenix.com
Fri Aug 29 14:43:11 CEST 2008


On 2008-08-29 13:03, Matt Giuca wrote:

Being hashable is a different from being usable as dictionary key.

Dictionaries perform the lookup based on the hash value, but will then have to check for hash collisions based on an equal comparison. If an object does not define an equal comparison, then it is not usable as dictionary key. But if an object defines neither eq or hash, then by default it is usable as a dictionary key (using the id() of the object for both default equality and hashing, which is fine, and works for all user-defined types by default). An object defining hash but not eq is not problematic, since it still uses id() default for equality. (It just has potentially bad dictionary performance, if lots of things hash the same even though they aren't equal). This it not a problem by definition because *it is officially "okay" for two objects to hash the same, even if they aren't equal, though undesirable*.

Note that only instances have the default hash value id(obj). This is not true in general. Most types don't implement the tp_hash slot and thus are not hashable. Indeed, mutable types should not implement that slot unless they know what they're doing :-)

So all hashable objects are usable as dictionary keys, are they not? (As far as I know it isn't possible to have an object that does not have an equality comparison, unless you explicitly override eq and have it raise a TypeError for some reason).

Sorry, I wasn't clear enough: with "not defining an equal comparison" I meant that an equal comparison does not succeed, ie. raises an exception or returns Py_NotImplemented (at the C level).

Due to the default of using the id(obj) as fallback for the equal comparison, this has to be explicitly coded for. If this is not the case (and that's probably the most common situation), then you're right: hashable implies usable as dictionary key.

It's probably a good idea to implement hash for objects that implement comparisons, but it won't always work and it is certainly not needed, unless you intend to use them as dictionary keys.

But from what I know, it is a bad idea to implement hash for any mutable object with non-reference equality (where equality depends on the mutable state), as an unbreakable rule. This is because if they are inserted into a dictionary, then mutated, they may suddenly be in the wrong bucket. This is why all the mutable types in Python with non-reference equality (eg. list, set, dict) are explicitly not hashable, while the immutable types (eg. tuple, frozenset, str) are hashable, and so are the mutable types with reference equality (eg. functions, user-defined classes by default).

Right.

By implementing hash in classes you have the explicit choice of either raising an exception or returning a useful hash value.

Again, the situation is better at the C level, since types don't have a default tp_hash implementation, so have to explicitly code such a slot in order for hash(obj) to work.

and that mutable objects should raise a TypeError in hash. That's a good idea, even though it's not needed either ;-)

So I think my above "axioms" are a better (less restrictive, and still correct) rule than this one. It's OK for a mutable object to define hash, as long as its eq doesn't depend upon its mutable state. For example, you can insert a function object into a dictionary, and mutate its closure, and it won't matter, because neither the hash nor the equality of the object is changing. It's only types like list and dict, with deep equality, where you run into this hash table problem.

I think the documentation needs to be changed to make the defaults explicit.

The documentation should probably say:

"If you implement cmp or eq on a class, also implement a hash method (and either have it raise an exception or return a valid non-changing hash value for the object)."

"If you implement hash on classes, you should consider implementing eq and/or cmp as well, in order to control how dictionaries use your objects."

In general, it's probably best to always implement both methods on classes, even if the application will just use one of them.

-- Marc-Andre Lemburg eGenix.com

Professional Python Services directly from the Source (#1, Aug 29 2008)

Python/Zope Consulting and Support ... http://www.egenix.com/ mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/


:::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! ::::

eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611



More information about the Python-Dev mailing list