[Python-Dev] gc ideas -- sparse memory (original) (raw)

Stephen J. Turnbull stephen at xemacs.org
Sat Dec 4 15:22:54 CET 2010


Quotes are out of order.

"Martin v. Löwis" writes:

No, it's not. java.lang.Object.hashCode() is equivalent to Python's hash(). In particular, for both of these, the following requirement holds: object that compare equal must hash equal.

Aha. I see, now. That is indeed a different concept. Mutable objects retain their identities across mutations.

Two distinct objects could have the same hash code, and a further test is needed to distinguish them.

Correct. However, references with different identity hash codes will never refer to the same object.

Why is useful to expose an identity hash? AFAICS it is only useful in building an identity hash table. If so, why not just provide id() or the is operator or both and be done with it?

This is different from exposing the value hash AFAICS, where it may be possible to optimize value hashing of composite objects by computing their hashes from the hashes of subobjects, perhaps omitting some subobjects that don't affect equality (eg, Emacs strings may carry display properties that are ignored when they are compared as strings).

With identity, I don't see much point in leaving that up to the programmer. In implementations with stable object addresses, the address is a fine implementation of identity_hash() -- perfect, in fact. Otherwise, you either have to omit mutable members from the computation or store the hashcode in the object. In the former situation, you will likely get very coarse hashing, and the set of immutable members (if any ;-) can be computed by the compiler. In the latter, you may as well store an id directly and avoid the whole hash concept for that type (except as a space optimization, but that again implies coarse hashing).

[F]or the identity hash code, or Python's id function: objects that compare equal may still have different ids, or different java identity hash codes.

ISTM that's no problem, you have to do an extra comparison for identity when the codes are equal either way. Of course the identity hash code is a more precise optimization, but that doesn't make hash() unusable for this purpose (cf Sun's built-in implementation of IdentityHashCode that always returns 2).

The problem presumably lies in the fact that there are "id()-ifiable" objects that, because they are mutable, either are unhashable (Python hash) or would have an unstable value hash (Lisp sxhash).

Ie, AFAICS

def identity_hash (thing): "Wrapper for hash() suitable for building an identity hash table." try: return hash(thing) except TypeError: return 0

would be a fine implementation for Python.

Like hash(), IdentityHashCode doesn't make any promises about identity at all.

It sure does: calling identityHashCode on the very same object twice will always return the same value - even after a garbage collection.

Well, if you want to put it that way, so does hash(). I read Steven as placing more emphasis on "like hash()" than on "at all".



More information about the Python-Dev mailing list