[Python-Dev] Hash collision security issue (now public)

Thu Dec 29 15:14:33 CET 2011

On Thu, 29 Dec 2011 14:32:21 +0100
Christian Heimes <lists at cheimes.de> wrote:
> Am 29.12.2011 13:57, schrieb Armin Ronacher:
> > Hi,
> > 
> > Something I should add to this now that I thought about it a bit more:
> > 
> > Assuming this should be fixed on a language level the solution would
> > probably be to salt hashes.  The most common hash to salt here is the
> > PyUnicode hash for obvious reasons.
> > 
> > - Option a: Compiled in Salt
> >   + Easy to implement
> >   - Breaks unittests most likely (those were broken in the first place
> >     but that's still a very annoying change to make)
> >   - Might cause problems with interoperability of Pythons compiled with
> >     different hash salts
> >   - You're not really solving the problem because each linux
> >     distribution (besides Gentoo I guess) would have just one salt
> >     compiled in and that would be popular enough to have the same
> >     issue.
> > 
> > - Option b: Environment variable for the salt
> >   + Easy-ish to implement
> >   + Easy to synchronize over different machines
> >   - initialization for base types happens early and unpredictive which
> >     makes it hard for embedded Python interpreters (think mod_wsgi and
> >     other things) to specify the salt
> > 
> > - Option c: Random salt at runtime
> >   + Easy to implement
> >   - impossible to synchronize
> >   - breaks unittests in the same way as a compiled in salt would do
> 
> - Option d: Don't change __hash__ but only use randomized hash for
> PyDictEntry lookup
>   + Easy to implement
>   - breaks only software to relies on a fixed order of dict keys
>   - breaks only a few to no unit tests
> 
> IMHO we don't have to alter the outcome of hash("some string"), hash(1)
> and all other related types. We just need to reduce the change the an
> attacker can produce collisions in the dict (and set?) code that looks
> up the slot (PyDictEntry). How about adding the random value in
> Object/dictobject.c:lookdict() and lookdict_str() (Python 2.x) /
> lookdict_unicode() (Python 3.x)? With this approach the hash of all our
> objects stay the same and just the dict code needs to be altered. The
> approach has also the benefit that all possible objects gain a
> randomized hash.

I basically agree with your proposal. The only downside is that custom
hashed containers (such as _pickle.c's memotable) don't
automatically benefit. That said, I think it would be difficult to
craft an attack against the aforementioned memotable (you would have
to basically choose the addresses of pickled objects).

Regards

Antoine.