[Python-Dev] Status of the fix for the hash collision vulnerability

Tue Jan 17 21:29:51 CET 2012

> I thought that the original problem was that with N insertions in the
> dictionary, by repeatedly inserting different keys generating the same
> hash value an attacker could arrange that the cost of finding an open
> slot is O(N), and thus the cost of N insertions is O(N^2).
> 
> If so, frequent resizing could make the attacker's problem much more
> difficult, as the distribution of secondary probes should change with
> each resize.

Not sure what you mean by "distribution of secondary probes".

Let H be the initial hash value, and let MASK be the current size
of the dictionary. Then I(n), the sequence of dictionary indices
being probed, is computed as

   I(0) = H & MASK
   PERTURB(0) = H
   I(n+1) = (5*I(n) + 1 + PERTURB(n)) & MASK
   PERTURN(n+1) = PERTURB(n) >> 5

So if two objects O1 and O2 have the same hash value H, the sequence of
probed indices is the same for any MASK value. It will be a different
sequence, yes, but they will still collide on each and every slot.

This is the very nature of open addressing. If it *wouldn't* try all
indices in the probe sequence, it may not be possible to perform
the lookup for a key correctly.

Regards,
Martin