[Python-Dev] Algoritmic Complexity Attack on Python (original) (raw)

Scott A Crosby scrosby@cs.rice.edu
30 May 2003 11🔞14 -0500

Previous message: [Python-Dev] Algoritmic Complexity Attack on Python
Next message: [Python-Dev] Algoritmic Complexity Attack on Python
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Fri, 30 May 2003 07:39:18 -0400, Guido van Rossum <guido@python.org> writes:

[Tim Peters] > > I'm uninterested in trying to "do something" about these. If > > resource-hogging is a serious potential problem in some context, then > > resource limitation is an operating system's job, and any use of Python (or > > Perl, etc) in such a context should be under the watchful eyes of OS > > subsystems that track actual resource usage.

[Scott Crosby] > I disagree. Changing the hash function eliminates these attacks on > hash tables. At what cost for Python? 99.99% of all Python programs are not vulnerable to this kind of attack, because they don't take huge amounts of arbitrary input from an untrusted source. If the hash function you propose is even a teensy bit slower than the one we've got now (and from your description I'm sure it has to be), everybody

We included several benchmarks in our paper:

On here, http://www.cs.rice.edu/~scrosby/hash/CrosbyWallach_UsenixSec2003/CacheEffects2.png

when we compare hash functions as the working set changes, we notice that a single L2 cache miss far exceeds hashing time for all algorithms.

http://www.cs.rice.edu/~scrosby/hash/CrosbyWallach_UsenixSec2003/LengthEffects.png

UHASH exceeds the performance of perl's hash function, which is simpler than your own.

Even for small strings, UHASH is only about half the speed of perl's hash function, and your function already performs a multiplication per byte:

#define HASH(hi,ho,c) ho = (1000003*hi) ^ c #define HASH0(ho,c) ho = ((c << 7)*1000003) ^ c

The difference between this and CW12 is one 32 bit modulo operation. (Please note that CW12 is currently broken. Fortunately it didn't affect the benchmarking on x86.)

would be paying for the solution to a problem they don't have. You keep insisting that you don't know Python. Hashing is used an awful lot in Python -- as an interpreted language, most variable lookups and all method and instance variable lookups use hashing. So this would affect every Python program.

Have you done benchmarking to prove that string_hash is in fact an important hotspot in the python interpreter? If so, and doing one modulo operation per string is unacceptable, then you may wish to consider Jenkin's hash. The linux kernel people are switching to using a keyed veriant of Jenkin's hash. However, Jenkin's, AFAIK, has no proofs that it is in fact universal. It, however, probably is safe.

It is not unlikely that if you went that route you'd be somewhat safer, and faster, but if you want full safety, you'd need to go with a universal hash.

Scott

Previous message: [Python-Dev] Algoritmic Complexity Attack on Python
Next message: [Python-Dev] Algoritmic Complexity Attack on Python
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]