[Python-Dev] Hash collision security issue (now public)

Antoine Pitrou solipsis at pitrou.net
Thu Dec 29 15:14:33 CET 2011


On Thu, 29 Dec 2011 14:32:21 +0100
Christian Heimes <lists at cheimes.de> wrote:
> Am 29.12.2011 13:57, schrieb Armin Ronacher:
> > Hi,
> > 
> > Something I should add to this now that I thought about it a bit more:
> > 
> > Assuming this should be fixed on a language level the solution would
> > probably be to salt hashes.  The most common hash to salt here is the
> > PyUnicode hash for obvious reasons.
> > 
> > - Option a: Compiled in Salt
> >   + Easy to implement
> >   - Breaks unittests most likely (those were broken in the first place
> >     but that's still a very annoying change to make)
> >   - Might cause problems with interoperability of Pythons compiled with
> >     different hash salts
> >   - You're not really solving the problem because each linux
> >     distribution (besides Gentoo I guess) would have just one salt
> >     compiled in and that would be popular enough to have the same
> >     issue.
> > 
> > - Option b: Environment variable for the salt
> >   + Easy-ish to implement
> >   + Easy to synchronize over different machines
> >   - initialization for base types happens early and unpredictive which
> >     makes it hard for embedded Python interpreters (think mod_wsgi and
> >     other things) to specify the salt
> > 
> > - Option c: Random salt at runtime
> >   + Easy to implement
> >   - impossible to synchronize
> >   - breaks unittests in the same way as a compiled in salt would do
> 
> - Option d: Don't change __hash__ but only use randomized hash for
> PyDictEntry lookup
>   + Easy to implement
>   - breaks only software to relies on a fixed order of dict keys
>   - breaks only a few to no unit tests
> 
> IMHO we don't have to alter the outcome of hash("some string"), hash(1)
> and all other related types. We just need to reduce the change the an
> attacker can produce collisions in the dict (and set?) code that looks
> up the slot (PyDictEntry). How about adding the random value in
> Object/dictobject.c:lookdict() and lookdict_str() (Python 2.x) /
> lookdict_unicode() (Python 3.x)? With this approach the hash of all our
> objects stay the same and just the dict code needs to be altered. The
> approach has also the benefit that all possible objects gain a
> randomized hash.

I basically agree with your proposal. The only downside is that custom
hashed containers (such as _pickle.c's memotable) don't
automatically benefit. That said, I think it would be difficult to
craft an attack against the aforementioned memotable (you would have
to basically choose the addresses of pickled objects).

Regards

Antoine.




More information about the Python-Dev mailing list