[Patches] [ python-Patches-1462796 ] Save the hash value of tuples

SourceForge.net noreply at sourceforge.net
Sun Apr 2 06:17:19 CEST 2006


Patches item #1462796, was opened at 2006-04-01 12:59
Message generated for change (Comment added) made by josiahcarlson
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1462796&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Noam Raphael (noamr)
Assigned to: Nobody/Anonymous (nobody)
Summary: Save the hash value of tuples

Initial Comment:
(Copied from a post to python-dev):

I've found out that the hash value of tuples isn't
saved after it's calculated. With strings it's
different: the hash value of a string is calculated
only on the first call to hash(string), and saved in
the structure for future use. Saving the value makes
dict lookup of tuples an operation with an amortized
cost of O(1).

Saving the hash value means that if an item of the
tuple changes its hash value, the hash value of the
tuple won't be changed. I think it's ok, since:
 1. Hash value of things shouldn't change.
 2. Dicts assume that too.

I tried the change, and it turned out that I had to
change cPickle a tiny bit: it uses a 2-tuple which is
allocated when the module initializes to lookup tuples
in a dict. I changed it to properly use PyTuple_New and
Py_DECREF, and now the complete test suite passes. I
run test_cpickle before the change and after it, and it
took the same time (0.89 seconds on my computer).


----------------------------------------------------------------------

Comment By: Josiah Carlson (josiahcarlson)
Date: 2006-04-01 20:17

Message:
Logged In: YES 
user_id=341410

I should also mention that the point of ammortized analysis
is to properly bound time based on an expected number of
operations.  You are not doing this, which is why your
ammortization is incorrect.  You are right that it would be
constant after you cache the hash, but it is not constant
ammortized in general.

----------------------------------------------------------------------

Comment By: Josiah Carlson (josiahcarlson)
Date: 2006-04-01 20:15

Message:
Logged In: YES 
user_id=341410

"Saving the value makes dict lookup of tuples an operation
with an amortized cost of O(1)."

This is an incorrect statement in regards to caching.  The
initial hashing costs whatever it takes to hash the tuple (
Omega(len(tuple)) time at least), but subsequent hashes cost
O(1).  Ammortized over 2 hashes is still Omega(len(tuple))
per hash, not O(1) as you stated.  In order for
ammortization to help the bounds, you would need to hash the
tuple Omega(len(tuple)) times (assuming the contents of the
tuple are hasable in O(1) time without ammortization), at
which point you would get O(1) ammortization.

----------------------------------------------------------------------

Comment By: Noam Raphael (noamr)
Date: 2006-04-01 15:20

Message:
Logged In: YES 
user_id=679426

What about changing cPickle to avoid its dependency on
tuples not saving their hash value?

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2006-04-01 15:00

Message:
Logged In: YES 
user_id=31435

I don't want this:  it increases the size of all tuple
objects without obvious (let alone compelling) benefit.  All
strings are hashable, but not all tuples are.  Python
internals arrange to force string interning in many
speed-sensitive cases, but only the empty tuple is interned
and hashing the empty tuple goes fast anyway; that makes it
less likely that a cached tuple hash will ever do any good.
 Finally, len(tuple) is typically small so the time needed
for hash(tuple) is also typically small (for tuples and
strings, the hash time is proportional to the number of
elements, and strings are typically "bigger" this way).

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1462796&group_id=5470


More information about the Patches mailing list