[Python-Dev] Saving the hash value of tuples

Guido van Rossum guido at python.org
Mon Apr 3 21:45:10 CEST 2006


On 4/2/06, Noam Raphael <noamraph at gmail.com> wrote:
> On 4/2/06, Guido van Rossum <guido at python.org> wrote:
> > > I tried the change, and it turned out that I had to change cPickle a
> > > tiny bit: it uses a 2-tuple which is allocated when the module
> > > initializes to lookup tuples in a dict. I changed it to properly use
> > > PyTuple_New and Py_DECREF, and now the complete test suite passes. I
> > > run test_cpickle before the change and after it, and it took the same
> > > time (0.89 seconds on my computer).
> >
> > Not just cPickle. I believe enumerate() also reuses a tuple.
>
> Maybe it does, but I believe that it doesn't calculate the hash value
> of it - otherwise, the test suite would probably have failed.

But someone else could.

> > > What do you think? I see three possibilities:
> > >   1. Nothing should be done, everything is as it should be.
> > >   2. The cPickle module should be changed to not abuse the tuple, but
> > > there's no reason to add an extra word to the tuple structure and
> > > break binary backwards compatibility.
> > >   3. Both should be changed.
> >
> > I'm -1 on the change. Tuples are pretty fundamental in Python and
> > hashing them is relatively rare. I think the extra required space for
> > all tuples isn't worth the potential savings for some cases.
>
> That's fine with me. But what about option 2? Perhaps cPickle (and
> maybe enumerate) should properly discard their tuples, so that if
> someone in the future decides that saving the hash value is a good
> idea, he won't encounter strange bugs? At least in cPickle I didn't
> notice any loss of speed because of the change, and it's quite
> sensible, since there's a tuple-reuse mechanism anyway.

No, these are carefully considered speed-ups.

--
--Guido van Rossum (home page: http://www.python.org/~guido/)


More information about the Python-Dev mailing list