[Python-Dev] Hash collision security issue (now public) (original) (raw)

Terry Reedy tjreedy at udel.edu
Thu Dec 29 23:19:58 CET 2011


The talk was a presentation yesterday by Alexander Klink and Julian Wälde at the Chaos Communication Congress in Germany hashDoS at alech.de I read the non-technical summary at http://arstechnica.com/business/news/2011/12/huge-portions-of-web-vulnerable-to-hashing-denial-of-service-attack.ars

and watched the video of the talk at https://www.youtube.com/watch?feature=player_embedded&v=_EEhviEO1Vo#

My summary: hash table creation with N keys changes from amortized O(N) to O(N2) time if the hash values of all the keys are the same. This should only happen for large N if done intentionally. This is easy to accomplish with a linear multiply and add hash function, such as used in PHP4 (but nowhere else that the authors found). A nonlinear multiply and xor hash function, used in one form or another by everything else, is much harder to break. It is theoretically vulnerable to brute-force search and this has been known for years. With a more cleaver meet-in-the-middle strategy, that builds a dict of suffixes and then searches for matching prefixes, 32-bit hashes are practically vulnerable. The attack depends on, for instance, 216 (64K) being 1/64K of 2**32. (I did not hear when this strategy was developed, but it is certainly more practical on a desktop now than even 8 years ago.)

[64-bit hashes are much, much less vulnerable to attack, at least for now. So it seems to me that anyone who hashes potential attack data can avoid the problem by using 64-bit Python with 64-bit hash values. If I understood Antoine, that should be all 64-bit builds.]

More summary: Perl added an #define option to start the hash calculation with non-zero value instead of 0 years ago to "avoid algorithmic complexity attacks". The patch is at 47:20 in the video. The authors believe all should do similarly.

[The change amounts to adding a char, unknown to attackers, to the beginning of every string before hashing. So it adds a small bit of time. The code patch shown did not show the source of the non-zero seed or the timing and scope of any randomization. As the discussion here has shown, this is an important issue to applications. So 'do the same' is inadequate and over-simplified advice. I believe Armin's patch is similar to the Perl patch.]

Since the authors sent out CERT alert about Nov 1, PHP has added to PHP5 a new function to limit the number of vars hashed. Microsoft will do something similar now with hash randomization to follow (maybe?). JRuby is going to do something. Java does not think it needs to change Java itself, but will leave all to the frameworks.

[The discussion here suggests that this is an inadequate response for 32-bit systems like Java since one person/group may not control all the pieces of a server system. However, a person or group can run all pieces on a system Python with an option turned on.]

-- Terry Jan Reedy



More information about the Python-Dev mailing list