[Python-Dev] Hash collision security issue (now public)

Jim Jewett jimjjewett at gmail.com
Sat Dec 31 02:04:39 CET 2011


In http://mail.python.org/pipermail/python-dev/2011-December/115138.html,
Christian Heimes
pointed out that

> ... we don't have to alter the outcome of hash ... We just need to reduce the chance that
> an attacker can produce collisions in the dict (and set?)

I'll state it more strongly.  hash probably should not change (at
least for this), but we may
want to consider a different conflict resolution strategy when the
first slot is already filled.

Remember that there was a fair amount of thought and timing effort put
into selecting the
current strategy; it is deliberately sub-optimal for random input, in
order to do better with
typical input.      <
http://hg.python.org/cpython/file/7010fa9bd190/Objects/dictnotes.txt >


If there is a change, it would currently be needed in three places for
each of set and dict
(the lookdict functions and insertdict_clean).  It may be worth adding
some macros just to
keep those six in sync. Once those macros are in place, that allows a
compile-time switch.

My personal opinion is that accepting *and parsing* enough data for
this to be a problem
is enough of an edge case that I don't want normal dicts slowed down
at all for this; I would
therefore prefer that the change be restricted to such a compile-time
switch, with current
behavior the default.


http://hg.python.org/cpython/file/7010fa9bd190/Objects/dictobject.c#l571

   583    for (perturb = hash; ep->me_key != NULL; perturb >>= PERTURB_SHIFT) {
   584         i = (i << 2) + i + perturb + 1;

PERTURB_SHIFT is already a private #define to 5; per dictnotes, 4 and 6 perform
almost as well.  Someone worried can easily make that change today,
and be protected
from "generic" anti-python attacks.

I believe the salt suggestions have equivalent to replacing
perturb = hash;
with something like    perturb = hash + salt;

Changing     i = (i << 2) + i + perturb + 1;    would allow
effectively replacing the initial hash,
but risks spoiling performance in the non-adversary case.

Would there be objections to replacing those two lines with something like:

    for (perturb = FIRST_PERTURB(hash, key);
         ep->me_key != NULL;
         perturb = NEXT_PERTURB(hash, key, perturb)) {
        i = NEXT_SLOT(i, perturb);


The default macro definitions should keep things as they are

    #define FIRST_PERTURB(hash, key)    hash
    #define NEXT_PERTURB(hash, key, perturb)    perturb >> PERTURB_SHIFT
    #define NEXT_SLOT(i, perturb)    (i << 2) + i + perturb + 1

while allowing #ifdefs for (slower but) safer things like adding a
salt, or even using
alternative hashes.

-jJ


More information about the Python-Dev mailing list