[Python-Dev] Counting collisions for the win

Fri Jan 20 03:47:13 CET 2012

On Thu, Jan 19, 2012 at 4:48 PM, Victor Stinner <
victor.stinner at haypocalc.com> wrote:

> Hi,
>
> I'm working on the hash collision issue since 2 or 3 weeks. I
> evaluated all solutions and I think that I have now a good knowledge
> of the problem and how it should be solved. The major issue is to have
> a minor or no impact on applications (don't break backward
> compatibility). I saw three major solutions:
>
>  - use a randomized hash
>  - use two hashes, a randomized hash and the actual hash kept for
> backward compatibility
>  - count collisions on dictionary lookup
>
> Using a randomized hash does break a lot of tests (e.g. tests relying
> on the representation of a dictionary). The patch is huge, too big to
> backport it directly on stable versions. Using a randomized hash may
> also break (indirectly) real applications because the application
> output is also somehow "randomized". For example, in the Django test
> suite, the HTML output is different at each run. Web browsers may
> render the web page differently, or crash, or ... I don't think that
> Django would like to sort attributes of each HTML tag, just because we
> wanted to fix a vulnerability.
>
> Randomized hash has also a major issue: if the attacker is able to
> compute the secret, (s)he can easily compute collisions and exploit
> the hash collision vulnerability again. I don't know exactly how
> complex it is to compute the secret, but our hash function is weak (it
> is far from being cryptographic, it is really simple to run it
> backward). If someone writes a fast function to compute the secret, we
> will go back to the same point.
>
> IMO using two hashes has the same disavantages of the randomized hash
> solution, whereas it is more complex to implement.
>
> The last solution is very simple: count collision and raise an
> exception if it hits a limit. The path is something like 10 lines
> whereas the randomized hash is more close to 500 lines, add a new
> file, change Visual Studio project file, etc. First I thaught that it
> would break more applications than the randomized hash, but I tried on
> Django: the test suite fails with a limit of 20 collisions, but not
> with a limit of 50 collisions, whereas the patch uses a limit of 1000
> collisions. According to my basic tests, a limit of 35 collisions
> requires a dictionary with more than 10,000,000 integer keys to raise
> an error. I am not talking about the attack, but valid data.
>
> More details about my tests on the Django test suite:
> http://bugs.python.org/issue13703#msg151620
>
> --
>
> I propose to solve the hash collision vulnerability by counting
> collisions because it does fix the vulnerability with a minor or no
> impact on applications or backward compatibility. I don't see why we
> should use a different fix for Python 3.3. If counting collisons
> solves the issue for stable versions, it is also enough for Python
> 3.3. We now know all issues of the randomized hash solution, and I
> think that there are more drawbacks than advantages. IMO the
> randomized hash is overkill to fix the hash collision issue.
>

+1

> I just have some requests on Marc Andre Lemburg patch:
>
>  - the limit should be configurable: a new function in the sys module
> should be enough. It may be private (or replaced by an environment
> variable?) in stable versions
>  - the set type should also be patched (I didn't check if it is
> vulnerable or not using the patch)
>  - the patch has no test! (a class with a fixed hash should be enough
> to write a test)
>  - the limit must be documented somwhere
>  - the exception type should be different than KeyError
>
> Victor
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> http://mail.python.org/mailman/options/python-dev/guido%40python.org
>

-- 
--Guido van Rossum (python.org/~guido)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20120119/a0c9f1a0/attachment.html>