[Python-Dev] Python-3.0, unicode, and os.environ (original) (raw)

Glenn Linderman v+python at g.nevcal.com
Mon Dec 8 03:17:04 CET 2008


On approximately 12/7/2008 10:56 AM, came the following characters from the keyboard of Adam Olsen:

You might receive a UTF-8 encoded file name from a malicious user, check if it contains something dangerous (like "../../../../../etc/password"), then decode it. If your decoder isn't compliant (ie doesn't check for overly long sequences) then a b'\xC0\xAF' gets translated into u'/', bypassing your previous check.

You might indeed.

But if you are interested in checking for security issues, shouldn't you first decode into some canonical form, specifying what sorts of Unicode strictness (such as overlong sequences) to check for during the decode process, and once the string is in canonical form, then do checks for various attacks, such as the ../ sequence you mention?

And with that order of operation, even if you don't reject overlong sequences, you have canonized them, and can recognize the resulting characters as good or bad.

-- Glenn -- http://nevcal.com/

A protocol is complete when there is nothing left to remove. -- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking



More information about the Python-Dev mailing list