[Python-Dev] Counting collisions for the win

Frank Sievertsen pydev at sievertsen.de
Fri Jan 20 16:55:32 CET 2012


Hello,

I still see at least two ways to create a DOS attack even with the
collison-counting-patch.

I assumed that it's possible to send ~500KB of payload to the
application.

1. It's fully deterministic which slots the dict will lookup.
Since we don't count slot-collisions, but only hash-value-collisions
this can be exploited easily by creating strings with the hash-values
along the lookup-way of an arbitrary (short) string.

So first we pick an arbitrary string. Then calculate which slots it will
visit on the way to the first empty slot. Then we create strings with
hash-values for these slots.

This attack first injects the strings to fill all the slots that the
one short string will want to visit. Then it adds THE SAME string
again and again. Since the entry is already there, nothing will be added
and no additional collisions happen, no exception raised.

$ ls -l super.txt
-rw-r--r-- 1 fx5 fx5 520000 20. Jan 10:19 super.txt
$ tail -n3 super.txt
FX5
FX5
FX5
$ wc -l super.txt
90000 super.txt
$ time python -c 'dict((unicode(l[:-1]), 0)  for l in open("super.txt"))'
real    0m52.724s
user    0m51.543s
sys    0m0.028s

2. The second attack actually attacks that 1000 allowed string
comparisons are still a lot of work.
First I added 999 strings that collide with a one-byte string "a". In
some applications a zero-byte string might work even better. Then I
can add a many thousand of the "a"'s, just like the first attack.

$ ls -l 1000.txt
-rw-r--r-- 1 fx5 fx5 500000 20. Jan 16:15 1000.txt
$ head -n 3 1000.txt
7hLci00
4wVFm10
_rZJU50
$ wc -l 1000.txt
247000 1000.txt
$ tail -n 3 1000.txt
a
a
a
$ time python -c 'dict((unicode(l[:-1]), 0)  for l in open("1000.txt"))'
real    0m17.408s
user    0m15.897s
sys    0m0.008s

Of course the first attack is far more efficient. One could argue
that 16 seconds is not enough for an attack. But maybe it's possible
to send 1MB, have zero-bytes strings, and since for example django
does 5 lookups per query-string this will keep it busy for ~80 seconds on
my pc.

What to do now?
I think it's not smart to reduce the number of allowed collisions 
dramatically
AND count all slot-collisions at the same time.

Frank


More information about the Python-Dev mailing list