(original) (raw)
> Still, it would represent an effort for the attacker of a much greaterI agree that doing anything is better than doing nothing. If we use
> magnitude than the current attack. It's all a trade-off -- at some point
> it'll just be easier for the attacker to use some other vulnerability. Also
> the class of vulnerable servers would be greatly reduced.
the earlier suggestion and prepend everything with a fixed-length
seed, we need quite a bit of entropy (and so a fairly long string) to
make a lasting difference.
Ah, but the effect of that long string is summarized in a single (32- or 64-bit) integer.
�
> I'm not sure I understand this. What's the worry about "a different class ofThis was mostly in reference to earlier suggestions of switching to
> hash functions"? (It may be clear that I do not have a deep mathematical
> understanding of hash functions.)
cityhash, or using btrees, or other more invasive changes. Python 2.X
is pretty stable and making large changes like that to the codebase
can have unpredictable effects.
Agreed.
�
We know that the current hash function
works well (except for this specific problem), so it seems like the
best fix will be as minimal a modification as possible, to avoid
introducing bugs.
Yup.
�
Windows has CryptGenRandom which is approximately equivalent.> I forget -- what do we do on systems without urandom()? (E.g. Windows?)
Agreed. As long as there isn't a gaping hole, I'm fine with that.
> Let's worry about timing attacks another time okay?
From a performance standpoint, this may still be better than adding 8
> Hm. I'm not sure I like the idea of extra arithmetic for every character
> being hashed.
or 10 characters to every single hash operation, since most hashes are
over short strings.
But how about precomputing the intermediate value (x)? The hash is (mostly) doing x = f(x, c) for each c in the input.
It is important that this function touches every
character - if it only interacts with a subset of them, an attacker
can fix that subset and vary the rest.
I sort of see your point, but I still think that if we could add as little per-character overhead as possible it would be best -- sometimes people *do* hash very long strings.
�
Yeah. This makes it much harder to attack, since it very solidly
> But I like the idea of a bigger random seed from which we
> deterministically pick some part.
places the attacker outside the realm of "just brute force the key".
We did consider this, and if performance is absolutely the prime
> How about just initializing x to some
> subsequence of the seed determined by e.g. the length of the hashed string
> plus a few characters from it?
directive, this (or a variant) may be the best option. Unfortunately,
the collision generator doesn't necessarily vary the length of the
string. Additionally, if we don't vary based on all the letters in the
string, an attacker can fix the characters that we do use and generate
colliding strings around them.
Still, much more work for the attacker.
�
Another option to consider would be to apply this change to some but
not all of the rounds. If we added the seed lookup xor operation for
only the first and last 5 values of x, we would still retain much of
the benefit without adding much computational overhead for very long
strings.
I like that.
We could also consider a less computationally expensive operation than
the modulo for calculating the lookup index, like simply truncating to
the correct number of bits.
Sure. Thanks for thinking about all the details here!!
--
--Guido van Rossum (python.org/~guido)