[Python-Dev] Counting collisions for the win

Guido van Rossum guido at python.org
Fri Jan 20 19:20:08 CET 2012


This is the first objection I have seen against collision-counting that
might stand.

On Fri, Jan 20, 2012 at 7:55 AM, Frank Sievertsen <pydev at sievertsen.de>wrote:

> Hello,
>
> I still see at least two ways to create a DOS attack even with the
> collison-counting-patch.
>
> I assumed that it's possible to send ~500KB of payload to the
> application.
>
> 1. It's fully deterministic which slots the dict will lookup.
> Since we don't count slot-collisions, but only hash-value-collisions
> this can be exploited easily by creating strings with the hash-values
> along the lookup-way of an arbitrary (short) string.
>
> So first we pick an arbitrary string. Then calculate which slots it will
> visit on the way to the first empty slot. Then we create strings with
> hash-values for these slots.
>
> This attack first injects the strings to fill all the slots that the
> one short string will want to visit. Then it adds THE SAME string
> again and again. Since the entry is already there, nothing will be added
> and no additional collisions happen, no exception raised.
>
> $ ls -l super.txt
> -rw-r--r-- 1 fx5 fx5 520000 20. Jan 10:19 super.txt
> $ tail -n3 super.txt
> FX5
> FX5
> FX5
> $ wc -l super.txt
> 90000 super.txt
> $ time python -c 'dict((unicode(l[:-1]), 0)  for l in open("super.txt"))'
> real    0m52.724s
> user    0m51.543s
> sys    0m0.028s
>
> 2. The second attack actually attacks that 1000 allowed string
> comparisons are still a lot of work.
> First I added 999 strings that collide with a one-byte string "a". In
> some applications a zero-byte string might work even better. Then I
> can add a many thousand of the "a"'s, just like the first attack.
>
> $ ls -l 1000.txt
> -rw-r--r-- 1 fx5 fx5 500000 20. Jan 16:15 1000.txt
> $ head -n 3 1000.txt
> 7hLci00
> 4wVFm10
> _rZJU50
> $ wc -l 1000.txt
> 247000 1000.txt
> $ tail -n 3 1000.txt
> a
> a
> a
> $ time python -c 'dict((unicode(l[:-1]), 0)  for l in open("1000.txt"))'
> real    0m17.408s
> user    0m15.897s
> sys    0m0.008s
>
> Of course the first attack is far more efficient. One could argue
> that 16 seconds is not enough for an attack. But maybe it's possible
> to send 1MB, have zero-bytes strings, and since for example django
> does 5 lookups per query-string this will keep it busy for ~80 seconds on
> my pc.
>
> What to do now?
> I think it's not smart to reduce the number of allowed collisions
> dramatically
> AND count all slot-collisions at the same time.
>
> Frank
>
> ______________________________**_________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/**mailman/listinfo/python-dev<http://mail.python.org/mailman/listinfo/python-dev>
> Unsubscribe: http://mail.python.org/**mailman/options/python-dev/**
> guido%40python.org<http://mail.python.org/mailman/options/python-dev/guido%40python.org>
>



-- 
--Guido van Rossum (python.org/~guido)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20120120/5ecaa6de/attachment.html>


More information about the Python-Dev mailing list