[Python-Dev] Saving the hash value of tuples (original) (raw)

Guido van Rossum guido at python.org
Mon Apr 3 21:45:10 CEST 2006


On 4/2/06, Noam Raphael <noamraph at gmail.com> wrote:

On 4/2/06, Guido van Rossum <guido at python.org> wrote: > > I tried the change, and it turned out that I had to change cPickle a > > tiny bit: it uses a 2-tuple which is allocated when the module > > initializes to lookup tuples in a dict. I changed it to properly use > > PyTupleNew and PyDECREF, and now the complete test suite passes. I > > run testcpickle before the change and after it, and it took the same > > time (0.89 seconds on my computer). > > Not just cPickle. I believe enumerate() also reuses a tuple.

Maybe it does, but I believe that it doesn't calculate the hash value of it - otherwise, the test suite would probably have failed.

But someone else could.

> > What do you think? I see three possibilities: > > 1. Nothing should be done, everything is as it should be. > > 2. The cPickle module should be changed to not abuse the tuple, but > > there's no reason to add an extra word to the tuple structure and > > break binary backwards compatibility. > > 3. Both should be changed. > > I'm -1 on the change. Tuples are pretty fundamental in Python and > hashing them is relatively rare. I think the extra required space for > all tuples isn't worth the potential savings for some cases.

That's fine with me. But what about option 2? Perhaps cPickle (and maybe enumerate) should properly discard their tuples, so that if someone in the future decides that saving the hash value is a good idea, he won't encounter strange bugs? At least in cPickle I didn't notice any loss of speed because of the change, and it's quite sensible, since there's a tuple-reuse mechanism anyway.

No, these are carefully considered speed-ups.

-- --Guido van Rossum (home page: http://www.python.org/~guido/)



More information about the Python-Dev mailing list