[Python-Dev] Hash collision security issue (now public) (original) (raw)

Armin Ronacher armin.ronacher at active-4.com
Thu Dec 29 12:29:53 CET 2011

Previous message: [Python-Dev] Hash collision security issue (now public)
Next message: [Python-Dev] Hash collision security issue (now public)
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Hi,

Just some extra thoughts about the whole topic in the light of web applications (since this was hinted in the talk) running on Python:

Yes, you can limit the number of maximum allowed parameters for post data but really there are so many places where data is parsed into hashing containers that it's quite a worthless task. Here a very brief list of things usually parsed into a dict or set and where it happens:

URL parameters and url encoded form data Generally this happens somewhere in a framework but typically also in utility libraries that deal with URLs. For instance the stdlib's cgi.parse_qs or urllib.parse.parse_qs on Python 3 do just that and that code is used left and right.

Even if a framework would start limiting it's own URL parsing there is still a lot of code that does not do that the stdlib does that as well.

With form data it's worse because you have multipart headers that need parsing and that is usually abstracted away so far from the user that they do not do that. Many frameworks just use the cgi module's parsing functions which also just directly feed into a dictionary.
HTTP headers. There is zero a WSGI framework can do about that since the headers are parsed into a dictionary by the WSGI server.
Incoming JSON data. Again outside of what the framework can do for the most part. simplejson can be modified to stop parsing with the hook stuff but nobody does that and since users invoke simplejson's parsing routines themselves most webapps would still be vulnerable even if all frameworks would fix the problem.
Hidden dict parameters. Things like the parameter part of content-type or the content-disposition headers are generally also just parsed into a dictionary. Likewise many frameworks parse things into set headers (for instance incoming etags). The cookie header is usually parsed into a dictionary as well.

The issue is nothing new and at least my current POV on this topic was that your server should be guarded and shoot handlers of requests going rogue. Dictionaries are not the only thing that has a worst case performance that could be triggered by user input.

That said. Considering that there are so many different places where things are probably close to arbitrarily long that is parsed into a dictionary or other hashing structure it's hard for a web application developer or framework to protect itself against.

In case the watchdog is not a viable solution as I had assumed it was, I think it's more reasonable to indeed consider adding a flag to Python that allows randomization of hashes optionally before startup.

However as it was said earlier, the attack is a lot more complex to carry out on a 64bit environment that it's probably (as it stands right now!) safe to ignore.

The main problem there however is not that it's a new attack but that some dickheads could now make prebaked attacks against websites to disrupt them that might cause some negative publicity. In general though there are so many more ways to DDOS a website than this that I would rate the whole issue very low.

Regards, Armin

Previous message: [Python-Dev] Hash collision security issue (now public)
Next message: [Python-Dev] Hash collision security issue (now public)
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the Python-Dev mailing list