[Python-Dev] Counting collisions for the win (original) (raw)
Ivan Kozik ivan at ludios.org
Fri Jan 20 04:32:13 CET 2012
- Previous message: [Python-Dev] Counting collisions for the win
- Next message: [Python-Dev] Counting collisions for the win
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Fri, Jan 20, 2012 at 00:48, Victor Stinner <victor.stinner at haypocalc.com> wrote:
I propose to solve the hash collision vulnerability by counting collisions because it does fix the vulnerability with a minor or no impact on applications or backward compatibility. I don't see why we should use a different fix for Python 3.3. If counting collisons solves the issue for stable versions, it is also enough for Python 3.3. We now know all issues of the randomized hash solution, and I think that there are more drawbacks than advantages. IMO the randomized hash is overkill to fix the hash collision issue.
I'd like to point out that an attacker is not limited to sending just one dict full of colliding keys. Given a 22ms stall for a dict full of 1000 colliding keys, and 100 such objects inside a parent object (perhaps JSON), you can stall a server for 2.2+ seconds. Going with the raise-at-1000 approach doesn't solve the problem for everyone.
In addition, because the raise-at-N-collisions approach raises an exception, everyone who wants to handle this error condition properly has to change their code to catch a previously-unexpected exception. (I know they're usually still better off with the fix, but why force many people to change code when you can actually fix the hashing problem?)
Another issue is that even with a configurable limit, different modules can't have their own limits. One module might want a relatively safe raise-at-100, and another module creating massive dicts might want raise-at-1000. How does a developer know whether they can raise or lower the limit, given that they use a bunch of different modules?
I actually went with this stop-at-N-collisions approach by patching my
CPython a few years ago, where I limiting dictobject and setobject's
critical for
loop to 100 iterations (I realize this might handle
fewer than 100 collisions.) This worked fine until I tried to compile
PyPy, where the translator blew up due to a massive dict. This,
combined with the second problem (needing to catch an exception), led
me to abandon this approach and write Securetypes, which has a
securedict that uses SHA-1. Not that I like this either; I think I'm
happy with the randomize-hash() approach.
Ivan
- Previous message: [Python-Dev] Counting collisions for the win
- Next message: [Python-Dev] Counting collisions for the win
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]