[Python-Dev] Hashing proposal: change only string-only dicts

"Martin v. Löwis" martin at v.loewis.de
Wed Jan 18 01:58:42 CET 2012


Am 17.01.2012 22:26, schrieb Antoine Pitrou:
> On Tue, 17 Jan 2012 21:59:28 +0100
> "Martin v. Löwis" <martin at v.loewis.de> wrote:
>> I'd like to propose a different approach to seeding the string hashes:
>> only do so for dictionaries involving only strings, and leave the
>> tp_hash slot of strings unchanged.
> 
> I think Python 3 would be better with a clean fix (all hashes
> randomized).
> Now for Python 2... The problem with this idea is that it only
> addresses str dicts. Unicode dicts, and any other dicts, are left
> vulnerable.

No, you misunderstood. I meant to propose that this applies to both
kinds of string (unicode and byte strings); for 2.x also dictionaries
including a mix of them.

> Only 2 bits are used in ob_sstate, meaning 30 are left. These 30 bits
> could cache a "hash perturbation" computed from the string and the
> random bits:
> 
> - hash() would use ob_shash
> - dict_lookup() would use ((ob_shash * 1000003) ^ (ob_sstate & ~3))
> 
> This way, you cache almost all computations, adding only a computation
> and a couple logical ops when looking up a string in a dict.

That's a good idea. For Unicode, it might be best to add another slot
into the object, even though this increases the object size.

Regards,
Martin


More information about the Python-Dev mailing list