[Python-Dev] Status of the fix for the hash collision vulnerability

Hrvoje Niksic hrvoje.niksic at avl.com
Wed Jan 18 11:15:49 CET 2012


On 01/17/2012 09:29 PM, "Martin v. Löwis" wrote:
>     I(0) = H&  MASK
>     PERTURB(0) = H
>     I(n+1) = (5*I(n) + 1 + PERTURB(n))&  MASK
>     PERTURN(n+1) = PERTURB(n)>>  5
>
> So if two objects O1 and O2 have the same hash value H, the sequence of
> probed indices is the same for any MASK value. It will be a different
> sequence, yes, but they will still collide on each and every slot.
>
> This is the very nature of open addressing.

Open addressing can still deploy a collision resolution mechanism 
without this property. For example, double hashing uses a different hash 
function (applied to the key) to calculate PERTURB(0). To defeat it, the 
attacker would have to produce keys that hash the same using both hash 
functions.

Double hashing is not a good general solution for Python dicts because 
it complicates the interface of hash tables that support arbitrary keys. 
Still, it could be considered for dicts with known key types (built-ins 
could hardcode the alternative hash function) or for SafeDicts, if they 
are still considered.

Hrvoje


More information about the Python-Dev mailing list