[Python-Dev] Pre-PEP: Mutable keys in dictionaries (original) (raw)

Guido van Rossum guido@python.org
Thu, 06 Feb 2003 13:57:33 -0500


Here's a quick pre-PEP.

Allow mutable objects as dictionary keys in certain cases by adding a new protocol. dict.setitem() and contains() would work like this bit of pseudo code: def setitem(self, key, value): if not hashable(key): if hasattr(key, "immutablecopy"): key = key.immutablecopy() else: raise TypeError, "unhashable key" ...insert value with key into dict... def contains(self, key): if not hashable(key): if hasattr(key, "temporaryimmutable"): key = key.temporaryimmutable() else: raise TypeError, "unhashable key" ...test for key membership... (other methods would be similar) When implemented in C in dictobject.c, this would add no overhead for the "normal" case (when keys are directly hashable).

I have a small doubt about this claim. Inserting code into PyDict_SetItem(), even if it is normally never executed, may reduce code locality and hence cause the code to miss the cache more often than before. This may be addressed by rearranging the code, but there's a limit to that.

I have a countersuggestion.

Could you do this as a subclass of dict? A subclass in Python would be inefficient, but might still be good enough for your second use case (the ObjC bridge). If not, a subclass in Python might be feasible -- it would be a little bit more work than integrating it into dictobject.c, but you have a lot less convincing to do, and you can even get it to work with Python 2.2.

The temporaryimmutable part of the protocol is needed to avoid unneccesary copying in those cases where the key isn't actually inserted into the dict, but is only used for membership testing. It would return a wrapper around the mutable object, defining eq and hash. (This idea is stolen from the sets.py module.) If an implementer doesn't care about this efficiency, temporaryimmutable can be an alias to immutablecopy.

Use Cases: - sets.py could be written more efficiently and/or could be more easily be rewritten in C. - Again, the troubles of bridging Python to Objective-C have triggered this idea. Due to the nature of Objective-C in general, and Cocoa in particular, we can never be sure beforehand whether a string is mutable or not. We can detect mutability at runtime, but depending on this is dangerous as it exposes an implementation detail that may change between releases of any API that returns strings. Therefore code that works in one setup may break in another, or even in the the same setup after an arbitrary library upgrade, simply because an API suddenly returns a mutable string where it didn't before. Yes this sucks, but it's the way it is. Backwards compatibilty: 100%. Overhead added None Implementation There's quite a bit of code duplication in those areas of dictobject.c that matter for this proposal, so implementing it will take longer than the 10 minutes I originally estimated . Perhaps this is a good opportunity to refactor these bits into inline functions? Not being a C guru I can't judge whether this is possible or desirable with the set of compilers that Python must be built with.

You can't rely on inline. OTOH, GCC -O3 often inlines static functions without needing a hint.

--Guido van Rossum (home page: http://www.python.org/~guido/)