[Python-Dev] Hashing proposal: change only string-only dicts

martin at v.loewis.de martin at v.loewis.de
Wed Jan 18 01:30:59 CET 2012


Zitat von Victor Stinner <victor.stinner at haypocalc.com>:

>> Each string would get two hashes: the "public" hash, which is constant
>> across runs and bugfix releases, and the dict-hash, which is only used
>> by the dictionary implementation, and only if all keys to the dict are
>> strings.
>
> The distinction between secret (private, secure) and "public" hash
> (deterministic) is not clear to me.

It's not about privacy or security. It's about compatibility. The
dict-hash is only used in the dict implementation, and never exposed,
leaving the tp_hash unmodified.

> Example: collections.UserDict implements __hash__() using
> hash(self.data).

Are you sure? I only see that used for UserString, not UserDict.

> collections.abc.Set computes its hash using hash(x) of each item. Same
> question.

The hash of the Set should most certainly use the element's tp_hash.
That *is* the hash of the objects, and it may collide for strings
just fine due to the vulnerability.

> If we need to use the secret hash, it should be exposed in Python.

It's not secret, just specific. I don't mind it being exposed. However,
that would be a new feature, which cannot be added in a security fix
or bug fix release.

> Which function/method would be used? I suppose that we cannot add
> anything to stable releases like 2.7.

Right. Nor do I see any need to expose it. It fixes the vulnerability
just fine without being exposed.

Regards,
Martin



More information about the Python-Dev mailing list