[Python-Dev] Status of the fix for the hash collision vulnerability

Sat Jan 14 03:55:22 CET 2012

On 14/01/12 12:58, Gregory P. Smith wrote:

> I do like *randomly seeding the hash*. *+1*. This is easy. It can easily be
> back ported to any Python version.
>
> It is perfectly okay to break existing users who had anything depending on
> ordering of internal hash tables. Their code was already broken.

For the record:

steve at runes:~$ python -c "print(hash('spam ham'))"
-376510515
steve at runes:~$ jython -c "print(hash('spam ham'))"
2054637885

So it is already the case that Python code that assumes stable hashing is broken.

For what it's worth, I'm not convinced that we should be overly-concerned by 
"poor saps" (Guido's words) who rely on accidents of implementation regarding 
hash. We shouldn't break their code unless we have a good reason, but this 
strikes me as a good reason. The documentation for hash certainly makes no 
promise about stability, and relying on it strikes me as about as sensible as 
relying on the stability of error messages.

I'm also not convinced that the option to raise an exception after 1000 
collisions actually solves the problem. That relies on the application being 
re-written to catch the exception and recover from it (how?). Otherwise, all 
it does is change the attack vector from "cause an indefinite number of hash 
collisions" to "cause 999 hash collisions followed by crashing the application 
with an exception", which doesn't strike me as much of an improvement.

+1 on random seeding. Default to on in 3.3+ and default to off in older 
versions, which allows people to avoid breaking their code until they're ready 
for it to be broken.

-- 
Steven