[Python-Dev] Counting collisions for the win (original) (raw)

Ivan Kozik ivan at ludios.org
Fri Jan 20 04:32:13 CET 2012

Previous message: [Python-Dev] Counting collisions for the win
Next message: [Python-Dev] Counting collisions for the win
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Fri, Jan 20, 2012 at 00:48, Victor Stinner <victor.stinner at haypocalc.com> wrote:

I propose to solve the hash collision vulnerability by counting collisions because it does fix the vulnerability with a minor or no impact on applications or backward compatibility. I don't see why we should use a different fix for Python 3.3. If counting collisons solves the issue for stable versions, it is also enough for Python 3.3. We now know all issues of the randomized hash solution, and I think that there are more drawbacks than advantages. IMO the randomized hash is overkill to fix the hash collision issue.

I'd like to point out that an attacker is not limited to sending just one dict full of colliding keys. Given a 22ms stall for a dict full of 1000 colliding keys, and 100 such objects inside a parent object (perhaps JSON), you can stall a server for 2.2+ seconds. Going with the raise-at-1000 approach doesn't solve the problem for everyone.

In addition, because the raise-at-N-collisions approach raises an exception, everyone who wants to handle this error condition properly has to change their code to catch a previously-unexpected exception. (I know they're usually still better off with the fix, but why force many people to change code when you can actually fix the hashing problem?)

Another issue is that even with a configurable limit, different modules can't have their own limits. One module might want a relatively safe raise-at-100, and another module creating massive dicts might want raise-at-1000. How does a developer know whether they can raise or lower the limit, given that they use a bunch of different modules?

I actually went with this stop-at-N-collisions approach by patching my CPython a few years ago, where I limiting dictobject and setobject's critical for loop to 100 iterations (I realize this might handle fewer than 100 collisions.) This worked fine until I tried to compile PyPy, where the translator blew up due to a massive dict. This, combined with the second problem (needing to catch an exception), led me to abandon this approach and write Securetypes, which has a securedict that uses SHA-1. Not that I like this either; I think I'm happy with the randomize-hash() approach.

Ivan

Previous message: [Python-Dev] Counting collisions for the win
Next message: [Python-Dev] Counting collisions for the win
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the Python-Dev mailing list