Message 152051 - Python tracker (original) (raw)

I'm sorry then, but I'm a little confused. I think we pretty clearly established earlier that requiring users to make changes anywhere they stored user data would be dangerous, because these locations are often in libraries or other places where the code creating and modifying the dictionary has no idea it's user data in it.

I don't consider that established for the specific case of string-like objects. Users can easily determine whether they use string-like objects, and if so, in what places, and what data gets put into them.

The proposed AVL solution fails if it requires users to fundamentally restructure their data depending on it's origin.

It doesn't fail at all. User don't have to restructure their code, let alone fundamentally. Their code may currently be vulnerable, yet not use string-like objects at all. With the proposed solution, such code will be fixed for good.

It's true that the solution does not fix all cases of the vulnerability, but neither does any other proposed solution.

We have solution that is known to work in all cases: hash randomization.

Well, you believe that it fixes the problem, even though it actually may not, assuming an attacker can somehow reproduce the hash function.

There were three discussed issues with it:

a) Code assuming a stable ordering to dictionaries b) Code assuming hashes were stable across runs. c) Code reimplementing the hashing algorithm of a core datatype that is now randomized.

I don't think any of these are realistic issues

I'm fairly certain that code will break in massive ways, despite any argumentation that it should not. The question really is

Do we break code in a massive way, or do we fix the vulnerability for most users with no code breakage?

I clearly value compatibility much higher than 100% protection against a DoS-style attack (which has many other forms of protecting against available also).

(a) was never a documented, or intended property, indeed it breaks all the time, if you insert keys in the wrong order, use a different platform, or anything else can change.

Still, a lot of code relies on dictionary order, and successfully so, in practice. Practicality beats purity.

(b) For the same reasons code relying on (b) only worked if you didn't change anything

That's not true. You cannot practically change the way string hashing works other than by changing the interpreter source. Hashes are currently stable across runs.

and in practice I'm convinced neither of these were common (if ever existed).

Are you willing to bet the trust people have in Python's bug fix policies on that? I'm not.

In summary, I think the case against hash-randomization has been seriously overstated, and in no way is more dangerous than having a solution that fails to solve the problem comprehensively. Further, I think it is imperative that we reach a consensus on this quickly

Well, I cannot be part of a consensus that involves massive code breakage in a bug fix release. Lacking consensus, either the release managers or the BDFL will have to pronounce.