[Python-Dev] Counting collisions for the win (original) (raw)

Guido van Rossum guido at python.org
Fri Jan 20 19:20:08 CET 2012


This is the first objection I have seen against collision-counting that might stand.

On Fri, Jan 20, 2012 at 7:55 AM, Frank Sievertsen <pydev at sievertsen.de>wrote:

Hello,

I still see at least two ways to create a DOS attack even with the collison-counting-patch. I assumed that it's possible to send ~500KB of payload to the application. 1. It's fully deterministic which slots the dict will lookup. Since we don't count slot-collisions, but only hash-value-collisions this can be exploited easily by creating strings with the hash-values along the lookup-way of an arbitrary (short) string. So first we pick an arbitrary string. Then calculate which slots it will visit on the way to the first empty slot. Then we create strings with hash-values for these slots. This attack first injects the strings to fill all the slots that the one short string will want to visit. Then it adds THE SAME string again and again. Since the entry is already there, nothing will be added and no additional collisions happen, no exception raised. $ ls -l super.txt -rw-r--r-- 1 fx5 fx5 520000 20. Jan 10:19 super.txt $ tail -n3 super.txt FX5 FX5 FX5 $ wc -l super.txt 90000 super.txt $ time python -c 'dict((unicode(l[:-1]), 0) for l in open("super.txt"))' real 0m52.724s user 0m51.543s sys 0m0.028s 2. The second attack actually attacks that 1000 allowed string comparisons are still a lot of work. First I added 999 strings that collide with a one-byte string "a". In some applications a zero-byte string might work even better. Then I can add a many thousand of the "a"'s, just like the first attack. $ ls -l 1000.txt -rw-r--r-- 1 fx5 fx5 500000 20. Jan 16:15 1000.txt $ head -n 3 1000.txt 7hLci00 4wVFm10 rZJU50 $ wc -l 1000.txt 247000 1000.txt $ tail -n 3 1000.txt a a a $ time python -c 'dict((unicode(l[:-1]), 0) for l in open("1000.txt"))' real 0m17.408s user 0m15.897s sys 0m0.008s Of course the first attack is far more efficient. One could argue that 16 seconds is not enough for an attack. But maybe it's possible to send 1MB, have zero-bytes strings, and since for example django does 5 lookups per query-string this will keep it busy for ~80 seconds on my pc. What to do now? I think it's not smart to reduce the number of allowed collisions dramatically AND count all slot-collisions at the same time. Frank _______** Python-Dev mailing list Python-Dev at python.org http://mail.python.org/**mailman/listinfo/python-dev<http://mail.python.org/mailman/listinfo/python-dev> Unsubscribe: http://mail.python.org/mailman/options/python-dev/ guido%40python.org<http://mail.python.org/mailman/options/python-dev/guido%40python.org>

-- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/python-dev/attachments/20120120/5ecaa6de/attachment.html>



More information about the Python-Dev mailing list