[issue13703] Hash collision security issue

Wed Jan 11 10:56:12 CET 2012

STINNER Victor <victor.stinner at haypocalc.com> added the comment:

>  * it is exceedingly complex

Which part exactly? For hash(str), it just add two extra XOR.

>  * the method would need to be implemented for all hashable Python types

It was already discussed, and it was said that only hash(str) need to
be modified.

>  * it causes startup time to increase (you need urandom data for
>   every single hashable Python data type)

My patch reads 8 or 16 bytes from /dev/urandom which doesn't block. Do
you have a benchmark showing a difference?

I didn't try my patch on Windows yet.

>  * it causes run-time to increase due to changes in the hash
>   algorithm (more operations in the tight loop)

I posted a micro-benchmark on hash(str) on python-dev: the overhead is
nul. Did you have numbers showing that the overhead is not nul?

>  * causes different processes in a multi-process setup to use different
>   hashes for the same object

Correct. If you need to get the same hash, you can disable the
randomized hash (PYTHONHASHSEED=0) or use a fixed seed (e.g.
PYTHONHASHSEED=42).

>  * doesn't appear to work well in embedded interpreters that
>   regularly restarted interpreters (AFAIK, some objects persist across
>   restarts and those will have wrong hash values in the newly started
>   instances)

test_capi runs _testembed which restarts a embedded interpreters 3
times, and the test pass (with my patch version 5). Can you write a
script showing the problem if there is a real problem?

In an older version of my patch, the hash secret was recreated at each
initiliazation. I changed my patch to only generate the secret once.

> The most important issue, though, is that it doesn't really
> protect Python against the attack - it only makes it less
> likely that an adversary will find the init vector (or a way
> around having to find it via crypt analysis).

I agree that the patch is not perfect. As written in the patch, it
just makes the attack more complex. I consider that it is enough.

Perl has a simpler protection than the one proposed in my patch. Is
Perl vulnerable to the hash collision vulnerability?

----------

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue13703>
_______________________________________