[Python-Dev] Saving the hash value of tuples (original) (raw)

Guido van Rossum guido at python.org
Sun Apr 2 19:38:04 CEST 2006


On 4/1/06, Noam Raphael <noamraph at gmail.com> wrote:

I've found out that the hash value of tuples isn't saved after it's calculated. With strings it's different: the hash value of a string is calculated only on the first call to hash(string), and saved in the structure for future use. Saving the value makes dict lookup of tuples an operation with an amortized cost of O(1).

Have you done any measurements to confirm that this makes any difference? I'm not particularly enamored of theoretical optimizations. This one in particular has a definite cost in space which needs to be weighed seriously.

Saving the hash value means that if an item of the tuple changes its hash value, the hash value of the tuple won't be changed. I think it's ok, since: 1. Hash value of things shouldn't change. 2. Dicts assume that too.

Sure. (But you do realize that if a tuple contains a mutable value its hash value raises an exception, right?)

I tried the change, and it turned out that I had to change cPickle a tiny bit: it uses a 2-tuple which is allocated when the module initializes to lookup tuples in a dict. I changed it to properly use PyTupleNew and PyDECREF, and now the complete test suite passes. I run testcpickle before the change and after it, and it took the same time (0.89 seconds on my computer).

Not just cPickle. I believe enumerate() also reuses a tuple.

What do you think? I see three possibilities: 1. Nothing should be done, everything is as it should be. 2. The cPickle module should be changed to not abuse the tuple, but there's no reason to add an extra word to the tuple structure and break binary backwards compatibility. 3. Both should be changed.

I'm -1 on the change. Tuples are pretty fundamental in Python and hashing them is relatively rare. I think the extra required space for all tuples isn't worth the potential savings for some cases.

-- --Guido van Rossum (home page: http://www.python.org/~guido/)



More information about the Python-Dev mailing list