[Python-Dev] Python-3.0, unicode, and os.environ (original) (raw)

Adam Olsen rhamph at gmail.com
Mon Dec 8 06:11:21 CET 2008


On Sun, Dec 7, 2008 at 9:45 PM, Glenn Linderman <v+python at g.nevcal.com> wrote:

On approximately 12/7/2008 8:13 PM, came the following characters from the keyboard of Stephen J. Turnbull:

Glenn Linderman writes: > But if you are interested in checking for security issues, shouldn't you > first decode into some canonical form, Yes. That's all that is being asked for: that Python do strict decoding to a canonical form by default. That's a lot to ask, as it turns out, but that is what we (the minority of strict Unicode adherents, that is) want. I have no problem with having strict validation available. But doesn't validation take significantly longer than decoding? So I think it should be logically decoupled... do validation when/where it is needed for security reasons, and allow internal [de]coding to be faster.

I'd like to see benchmarks of such a claim.

I'm mostly indifferent about which should be the default... maybe there shouldn't be a default! Use the "vUTF-8" decoder for strict validation, and the "fUTF-8" decoder for the faster, non-validating version. Or something like that. With appropriate documentation. Of course, "UTF-8" already exists... as "fUTF-8", so for compatibility, I guess it shouldn't change... but it could be deprecated.

You didn't address the issue that if the decoding to a canonical form is done first, many of the insecurities just go away, so why throw errors?

Unicode is intended to allow interaction between various bits of software. It may be that a library checked it in UTF-8, then passed it to python. It would be nice if the library validated too, but a major advantage of UTF-8 is older libraries (or protocols!) intended for ASCII need only be 8-bit clean to be repurposed for UTF-8. Their security checks continue to work, so long as nobody down stream introduces problems with a non-validating decoder.

-- Adam Olsen, aka Rhamphoryncus



More information about the Python-Dev mailing list