[Python-Dev] Status of the fix for the hash collision vulnerability

Jack Diederich jackdied at gmail.com
Sat Jan 14 07:24:54 CET 2012


On Thu, Jan 12, 2012 at 9:57 PM, Guido van Rossum <guido at python.org> wrote:
> Hm... I started out as a big fan of the randomized hash, but thinking more
> about it, I actually believe that the chances of some legitimate app having
>>1000 collisions are way smaller than the chances that somebody's code will
> break due to the variable hashing.

Python's dicts are designed to avoid hash conflicts by resizing and
keeping the available slots bountiful.  1000 conflicts sounds like a
number that couldn't be hit accidentally unless you had a single dict
using a terabyte of RAM (i.e. if Titus Brown doesn't object, we're
good).   The hashes also look to exploit cache locality but that is
very unlikely to get one thousand conflicts by chance.  If you get
that many there is an attack.

> This is depending on how the counting is done (I didn't look at MAL's
> patch), and assuming that increasing the hash table size will generally
> reduce collisions if items collide but their hashes are different.

The patch counts conflicts on an individual insert and not lifetime
conflicts.  Looks sane to me.

> That said, even with collision counting I'd like a way to disable it without
> changing the code, e.g. a flag or environment variable.

Agreed.  Paranoid people can turn the behavior off and if it ever were
to become a problem in practice we could point people to a solution.

-Jack


More information about the Python-Dev mailing list