[Python-Dev] Hash collision security issue (now public)

Victor Stinner victor.stinner at haypocalc.com
Fri Jan 6 12:42:44 CET 2012


Using my patch (random-2.patch), the overhead is 0%. I cannot see a
difference with and without my patch.

Numbers:
---
unpatched:
== 3 characters ==
1 loops, best of 3: 459 usec per loop
== 10 characters ==
1 loops, best of 3: 575 usec per loop
== 500 characters ==
1 loops, best of 3: 1.36 msec per loop

patched:

== 3 characters ==
1 loops, best of 3: 458 usec per loop
== 10 characters ==
1 loops, best of 3: 575 usec per loop
== 500 characters ==
1 loops, best of 3: 1.36 msec per loop
---
(the patched version looks faster just because the timer is not
reliable enough for such fast test)

Script:
---
echo "== 3 characters =="
./python -m timeit -n 1 -s 'text=(("%03i" % x) for x in
range(1,1000))' 'sum(hash(x) for x in text)'
./python -m timeit -n 1 -s 'text=(("%03i" % x) for x in
range(1,1000))' 'sum(hash(x) for x in text)'
./python -m timeit -n 1 -s 'text=(("%03i" % x) for x in
range(1,1000))' 'sum(hash(x) for x in text)'

echo "== 10 characters =="
./python -m timeit -n 1 -s 'text=(("%010i" % x) for x in
range(1,1000))' 'sum(hash(x) for x in text)'
./python -m timeit -n 1 -s 'text=(("%010i" % x) for x in
range(1,1000))' 'sum(hash(x) for x in text)'
./python -m timeit -n 1 -s 'text=(("%010i" % x) for x in
range(1,1000))' 'sum(hash(x) for x in text)'

echo "== 500 characters =="
./python -m timeit -n 1 -s 'text=(("%0500i" % x) for x in
range(1,1000))' 'sum(hash(x) for x in text)'
./python -m timeit -n 1 -s 'text=(("%0500i" % x) for x in
range(1,1000))' 'sum(hash(x) for x in text)'
./python -m timeit -n 1 -s 'text=(("%0500i" % x)
---
(Take the smallest timing for each test)

"-n 1" is needed because the hash value is only computed once (is
cached). I may be possible to have more reliable results by disabling
completly the hash cache (comment "PyUnicode_HASH(self) = x;" line).

Victor


More information about the Python-Dev mailing list