[Python-Dev] Hashing proposal: change only string-only dicts

Victor Stinner victor.stinner at haypocalc.com
Tue Jan 17 23:06:47 CET 2012


2012/1/17 "Martin v. Löwis" <martin at v.loewis.de>:
> I'd like to propose a different approach to seeding the string hashes:
> only do so for dictionaries involving only strings, and leave the
> tp_hash slot of strings unchanged.

The real problem is in dict (or any structure using an hash table), so
if it is possible, it would also prefer to fix the problem directly in
dict.

> There are several variants of that approach wrt. caching of the hash
> 1. add an additional field to all string objects, to cache the second
>   hash value.
>   a) variant: in 3.3, drop the extra field, and declare that hashes
>   may change across runs
> 2. only cache the dict-hash, recomputing the public hash each time
> 3. on a per-string choice, cache either the dict-hash or the public
>   hash, depending on which one gets computed first, and recompute
>   the other one every time it's needed.

There is a simpler solution:

bucket_index = (hash(str) ^ secret) & DICT_MASK.

Remark: set must also be fixed.

Victor


More information about the Python-Dev mailing list