[Python-Dev] Status of the fix for the hash collision vulnerability

Sat Jan 14 09:05:24 CET 2012

Jack Diederich writes:
 > On Thu, Jan 12, 2012 at 9:57 PM, Guido van Rossum <guido at python.org> wrote:
 > > Hm... I started out as a big fan of the randomized hash, but thinking more
 > > about it, I actually believe that the chances of some legitimate app having
 > >>1000 collisions are way smaller than the chances that somebody's code will
 > > break due to the variable hashing.
 > 
 > Python's dicts are designed to avoid hash conflicts by resizing and
 > keeping the available slots bountiful.  1000 conflicts sounds like a
 > number that couldn't be hit accidentally

I may be missing something, but AIUI, with the resize, the search for
an unused slot after collision will be looking in a different series
of slots, so the N counter for the N^2 behavior resets on resize.  If
not, you can delete this message now.

If so, since (a) in the error-on-many-collisions approach we're adding
a test here for collision count anyway and (b) we think this is almost
never gonna happen, can't we defuse the exploit by just resizing the
dict after 1000 collisions, with strictly better performance than the
error approach, and almost current performance for "normal" input?

In order to prevent attackers from exploiting every 1000th collision
to force out-of-memory, the expansion factor for collision-induced
resizing could be "very small".  (I don't know if that's possible in
the Python dict implementation, if the algorithm requires something
like doubling the dict size on every resize this is right out, of
course.)

Or, since this is an error/rare path anyway, offer the user a choice
of an error or a resize on hitting 1000 collisions?