[Python-Dev] Status of the fix for the hash collision vulnerability
Steven D'Aprano
steve at pearwood.info
Sat Jan 14 03:55:22 CET 2012
On 14/01/12 12:58, Gregory P. Smith wrote:
> I do like *randomly seeding the hash*. *+1*. This is easy. It can easily be
> back ported to any Python version.
>
> It is perfectly okay to break existing users who had anything depending on
> ordering of internal hash tables. Their code was already broken.
For the record:
steve at runes:~$ python -c "print(hash('spam ham'))"
-376510515
steve at runes:~$ jython -c "print(hash('spam ham'))"
2054637885
So it is already the case that Python code that assumes stable hashing is broken.
For what it's worth, I'm not convinced that we should be overly-concerned by
"poor saps" (Guido's words) who rely on accidents of implementation regarding
hash. We shouldn't break their code unless we have a good reason, but this
strikes me as a good reason. The documentation for hash certainly makes no
promise about stability, and relying on it strikes me as about as sensible as
relying on the stability of error messages.
I'm also not convinced that the option to raise an exception after 1000
collisions actually solves the problem. That relies on the application being
re-written to catch the exception and recover from it (how?). Otherwise, all
it does is change the attack vector from "cause an indefinite number of hash
collisions" to "cause 999 hash collisions followed by crashing the application
with an exception", which doesn't strike me as much of an improvement.
+1 on random seeding. Default to on in 3.3+ and default to off in older
versions, which allows people to avoid breaking their code until they're ready
for it to be broken.
--
Steven
More information about the Python-Dev
mailing list