[Python-Dev] Saving the hash value of tuples

Noam Raphael noamraph at gmail.com
Sun Apr 2 23:54:14 CEST 2006


On 4/2/06, Guido van Rossum <guido at python.org> wrote:
> > I tried the change, and it turned out that I had to change cPickle a
> > tiny bit: it uses a 2-tuple which is allocated when the module
> > initializes to lookup tuples in a dict. I changed it to properly use
> > PyTuple_New and Py_DECREF, and now the complete test suite passes. I
> > run test_cpickle before the change and after it, and it took the same
> > time (0.89 seconds on my computer).
>
> Not just cPickle. I believe enumerate() also reuses a tuple.

Maybe it does, but I believe that it doesn't calculate the hash value
of it - otherwise, the test suite would probably have failed.
>
> > What do you think? I see three possibilities:
> >   1. Nothing should be done, everything is as it should be.
> >   2. The cPickle module should be changed to not abuse the tuple, but
> > there's no reason to add an extra word to the tuple structure and
> > break binary backwards compatibility.
> >   3. Both should be changed.
>
> I'm -1 on the change. Tuples are pretty fundamental in Python and
> hashing them is relatively rare. I think the extra required space for
> all tuples isn't worth the potential savings for some cases.

That's fine with me. But what about option 2? Perhaps cPickle (and
maybe enumerate) should properly discard their tuples, so that if
someone in the future decides that saving the hash value is a good
idea, he won't encounter strange bugs? At least in cPickle I didn't
notice any loss of speed because of the change, and it's quite
sensible, since there's a tuple-reuse mechanism anyway.

Noam


More information about the Python-Dev mailing list