[Python-Dev] Hash collision security issue (now public)

Thu Dec 29 23:19:58 CET 2011

The talk was a presentation yesterday by Alexander Klink and Julian 
Wälde at the Chaos Communication Congress in Germany hashDoS at alech.de
I read the non-technical summary at
http://arstechnica.com/business/news/2011/12/huge-portions-of-web-vulnerable-to-hashing-denial-of-service-attack.ars 

and watched the video of the talk at
https://www.youtube.com/watch?feature=player_embedded&v=_EEhviEO1Vo#

My summary: hash table creation with N keys changes from amortized O(N)
to O(N**2) time if the hash values of all the keys are the same. This 
should only happen for large N if done intentionally. This is easy to 
accomplish with a linear multiply and add hash function, such as used in 
PHP4 (but nowhere else that the authors found). A nonlinear multiply and 
xor hash function, used in one form or another by everything else, is 
much harder to break. It is *theoretically* vulnerable to brute-force 
search and this has been known for years. With a more cleaver 
meet-in-the-middle strategy, that builds a dict of suffixes and then 
searches for matching prefixes, 32-bit hashes are *practically* 
vulnerable. The attack depends on, for instance, 2**16 (64K) being 1/64K 
of 2**32. (I did not hear when this strategy was developed, but it is 
certainly more practical on a desktop now than even 8 years ago.)

[64-bit hashes are much, much less vulnerable to attack, at least for 
now. So it seems to me that anyone who hashes potential attack data can 
avoid the problem by using 64-bit Python with 64-bit hash values. If I 
understood Antoine, that should be all 64-bit builds.]

More summary: Perl added an #define option to start the hash calculation 
with non-zero value instead of 0 years ago to "avoid algorithmic 
complexity attacks". The patch is at 47:20 in the video. The authors 
believe all should do similarly.

[The change amounts to adding a char, unknown to attackers, to the 
beginning of every string before hashing. So it adds a small bit of 
time. The code patch shown did not show the source of the non-zero seed 
or the timing and scope of any randomization. As the discussion here has 
shown, this is an important issue to applications. So 'do the same' is 
inadequate and over-simplified advice. I believe Armin's patch is 
similar to the Perl patch.]

Since the authors sent out CERT alert about Nov 1, PHP has added to PHP5 
a new function to limit the number of vars hashed. Microsoft will do 
something similar now with hash randomization to follow (maybe?). JRuby 
is going to do something. Java does not think it needs to change Java 
itself, but will leave all to the frameworks.

[The discussion here suggests that this is an inadequate response for 
32-bit systems like Java since one person/group may not control all the 
pieces of a server system. However, a person or group can run all pieces 
on a system Python with an option turned on.]

-- 
Terry Jan Reedy