[Python-Dev] PEP 383 update: utf8b is now the error handler (original) (raw)

Stephen J. Turnbull stephen at xemacs.org
Wed May 6 07:35:30 CEST 2009


Lino Mastrodomenico writes:

2009/5/5 Stephen J. Turnbull <stephen at xemacs.org>:

Third, it is not clear to me why non-decodable ASCII should be an error.

The PEP originally allowed the conversion to U+DCxx of bytes below 128 that cannot be decoded by the encoding used, but this creates potential security problems.

See: <http://mail.python.org/pipermail/python-dev/2009-April/089102.html>

Yeah, yeah, this is the same old same old from PEP 3131. Anything that handles the various attacks based on ASCII-alike characters should at least rule out invalid Unicode, too!

And where is this U+DC2F supposed to be coming from, anyway? The user's local environment or the user's local filesystem! Codecs not using 'utf8b' can't produce it, so the only other cases are chr() and \u literals in the local process, or an already broken module in your code. I really can't imagine that any sane programmer these days would be using 'utf8b' on bytes received from the Internet!

Of course I can't prove that there's no vector for an exploit here (in fact, I'm sure there is one with sufficiently careless handling of input), but I think "consenting adults" covers the Shift JIS use case. Make it an option, but it should be explicitly part of the PEP.



More information about the Python-Dev mailing list