[Python-Dev] Hash collision security issue (now public)

Thu Dec 29 12:29:53 CET 2011

Hi,

Just some extra thoughts about the whole topic in the light of web
applications (since this was hinted in the talk) running on Python:

   Yes, you can limit the number of maximum allowed parameters for post
   data but really there are so many places where data is parsed into
   hashing containers that it's quite a worthless task.  Here a very
   brief list of things usually parsed into a dict or set and where it
   happens:

   - URL parameters and url encoded form data
     Generally this happens somewhere in a framework but typically also
     in utility libraries that deal with URLs.  For instance the
     stdlib's cgi.parse_qs or urllib.parse.parse_qs on Python 3 do
     just that and that code is used left and right.

     Even if a framework would start limiting it's own URL parsing there
     is still a lot of code that does not do that the stdlib does that
     as well.

     With form data it's worse because you have multipart headers that
     need parsing and that is usually abstracted away so far from the
     user that they do not do that.  Many frameworks just use the cgi
     module's parsing functions which also just directly feed into a
     dictionary.

   - HTTP headers.
     There is zero a WSGI framework can do about that since the headers
     are parsed into a dictionary by the WSGI server.

   - Incoming JSON data.
     Again outside of what the framework can do for the most part.
     simplejson can be modified to stop parsing with the hook stuff
     but nobody does that and since users invoke simplejson's parsing
     routines themselves most webapps would still be vulnerable even
     if all frameworks would fix the problem.

   - Hidden dict parameters.
     Things like the parameter part of content-type or the
     content-disposition headers are generally also just parsed into a
     dictionary.  Likewise many frameworks parse things into set headers
     (for instance incoming etags).  The cookie header is usually parsed
     into a dictionary as well.

The issue is nothing new and at least my current POV on this topic was
that your server should be guarded and shoot handlers of requests going
rogue.  Dictionaries are not the only thing that has a worst case
performance that could be triggered by user input.

That said.  Considering that there are so many different places where
things are probably close to arbitrarily long that is parsed into a
dictionary or other hashing structure it's hard for a web application
developer or framework to protect itself against.

In case the watchdog is not a viable solution as I had assumed it was, I
think it's more reasonable to indeed consider adding a flag to Python
that allows randomization of hashes optionally before startup.

However as it was said earlier, the attack is a lot more complex to
carry out on a 64bit environment that it's probably (as it stands right
now!) safe to ignore.

The main problem there however is not that it's a new attack but that
some dickheads could now make prebaked attacks against websites to
disrupt them that might cause some negative publicity.  In general
though there are so many more ways to DDOS a website than this that I
would rate the whole issue very low.

Regards,
Armin