[Python-Dev] Hashing proposal: change only string-only dicts

"Martin v. Löwis" martin at v.loewis.de
Tue Jan 17 21:59:28 CET 2012


I'd like to propose a different approach to seeding the string hashes:
only do so for dictionaries involving only strings, and leave the
tp_hash slot of strings unchanged.

Each string would get two hashes: the "public" hash, which is constant
across runs and bugfix releases, and the dict-hash, which is only used
by the dictionary implementation, and only if all keys to the dict are
strings. In order to allow caching of the hash, all dicts should use
the same hash (if caching wasn't necessary, each dict could use its own
seed).

There are several variants of that approach wrt. caching of the hash
1. add an additional field to all string objects, to cache the second
   hash value.
   a) variant: in 3.3, drop the extra field, and declare that hashes
   may change across runs
2. only cache the dict-hash, recomputing the public hash each time
3. on a per-string choice, cache either the dict-hash or the public
   hash, depending on which one gets computed first, and recompute
   the other one every time it's needed.

As you can see, 1 vs. 2/3 is a classical time-space-tradeoff.

What do you think?

Regards,
Martin


More information about the Python-Dev mailing list