[Python-Dev] Counting collisions w/ no need for a fatal exception (original) (raw)
Gregory P. Smith greg at krypto.org
Wed Jan 25 06:24:31 CET 2012
- Previous message: [Python-Dev] Status of Mac buildbots
- Next message: [Python-Dev] io module types
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Sun, Jan 22, 2012 at 10:41 PM, Tim Delaney <timothy.c.delaney at gmail.com> wrote:
On 23 January 2012 16:49, Lennart Regebro <regebro at gmail.com> wrote:
On Mon, Jan 23, 2012 at 06:02, Paul McMillan <paul at mcmillan.ws> wrote: >> We may use a different salt per dictionary. > > If we're willing to re-hash everything on a per-dictionary basis. That > doesn't seem reasonable given our existing usage. Well, if we get crazy amounts of collisions, re-hashing with a new salt to get rid of those collisions seems quite reasonable to me... Actually, this looks like it has the seed of a solution in it. I haven't scrutinised the following beyond "it sounds like it could work" - it could well contain nasty flaws. Assumption: We only get an excessive number of collisions during an attack (directly or indirectly). Assumption: Introducing a salt into hashes will change those hashes sufficiently to mitigate the attack (all discussion of randomising hashes makes this assumption). 1. Keep the current hashing (for all dictionaries) i.e. just using hash(key). 2. Count collisions. 3. If any key hits X collisions change that dictionary to use a random salt for hashes (at least for str and unicode keys). This salt would be remembered for the dictionary. Consequence: The dictionary would need to be rebuilt when an attack was detected. Consequence: Hash caching would no longer occur for this dictionary, making most operations more expensive. Consequence: Anything relying on the iteration order of a dictionary which has suffered excessive conflicts would fail.
+1
I like this! The dictionary would still be O(n) but the constant cost in front of that just went up. When you are dealing with keys coming in from outside of the process, those are unlikely to already have any hash values so the constant cost at insertion time has really not changed at all because they would need hashing anyways. Their cost at non-iteration lookup time will be a constant factor greater but I do not see that as being a problem given that known keys being looked up in a
This approach also allows for the dictionary hashing mode switch to occur after a lower number of collisions than the previous investigations into raising a MemoryError or similar were asking for (because they wanted to avoid false hard failures). It prevents that case from breaking in favor of a brief performance hiccup.
I would combine this with a per process/interpreter-instance seed in 3.3 and later for added impact (less need for this code path to ever be triggered). For the purposes of backporting as a security fix, that part would be disabled by default but #1-3 would be enabled by default.
Question A: Does the dictionary get rebuilt -again- with a new dict-salt if a large number of collisions occurs after a dict-salt has already been established?
Question B: Is there a size of dictionary in which we refuse to rebuild & rehash it because it would simply be too costly? obviously if we lack the ram to malloc a new table, when else? ever?
Suggestion: Would there be any benefit to making the number of collisions threshold on when to rebuild & rehash a log function of the dictionary's current size rather than a constant for all dicts?
4. (Optional) in 3.3, provide a way to get a dictionary with random salt (i.e. not wait for attack).
I don't like #4 as a documented public API as I'm not sure how well that'd play with other VMs (I suppose they could ignore it) but it would be useful for dict implementation testing purposes and easier studying of the behavior. If this is added it should be a method on the dict such as ._set_hash_salt() or something and for testing purposes it would be good to allow a dictionary to be queried to see if they are using their own salt or not (perhaps just ._get_hash_salt() returning non 0 means they are?)
-gps
- Previous message: [Python-Dev] Status of Mac buildbots
- Next message: [Python-Dev] io module types
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]