[Python-Dev] Counting collisions for the win (original) (raw)

Frank Sievertsen pydev at sievertsen.de
Fri Jan 20 16:55:32 CET 2012


Hello,

I still see at least two ways to create a DOS attack even with the collison-counting-patch.

I assumed that it's possible to send ~500KB of payload to the application.

  1. It's fully deterministic which slots the dict will lookup. Since we don't count slot-collisions, but only hash-value-collisions this can be exploited easily by creating strings with the hash-values along the lookup-way of an arbitrary (short) string.

So first we pick an arbitrary string. Then calculate which slots it will visit on the way to the first empty slot. Then we create strings with hash-values for these slots.

This attack first injects the strings to fill all the slots that the one short string will want to visit. Then it adds THE SAME string again and again. Since the entry is already there, nothing will be added and no additional collisions happen, no exception raised.

$ ls -l super.txt -rw-r--r-- 1 fx5 fx5 520000 20. Jan 10:19 super.txt $ tail -n3 super.txt FX5 FX5 FX5 $ wc -l super.txt 90000 super.txt $ time python -c 'dict((unicode(l[:-1]), 0) for l in open("super.txt"))' real 0m52.724s user 0m51.543s sys 0m0.028s

  1. The second attack actually attacks that 1000 allowed string comparisons are still a lot of work. First I added 999 strings that collide with a one-byte string "a". In some applications a zero-byte string might work even better. Then I can add a many thousand of the "a"'s, just like the first attack.

$ ls -l 1000.txt -rw-r--r-- 1 fx5 fx5 500000 20. Jan 16:15 1000.txt $ head -n 3 1000.txt 7hLci00 4wVFm10 _rZJU50 $ wc -l 1000.txt 247000 1000.txt $ tail -n 3 1000.txt a a a $ time python -c 'dict((unicode(l[:-1]), 0) for l in open("1000.txt"))' real 0m17.408s user 0m15.897s sys 0m0.008s

Of course the first attack is far more efficient. One could argue that 16 seconds is not enough for an attack. But maybe it's possible to send 1MB, have zero-bytes strings, and since for example django does 5 lookups per query-string this will keep it busy for ~80 seconds on my pc.

What to do now? I think it's not smart to reduce the number of allowed collisions dramatically AND count all slot-collisions at the same time.

Frank



More information about the Python-Dev mailing list