[Python-Dev] PEP 383 update: utf8b is now the error handler (original) (raw)

MRAB google at mrabarnett.plus.com
Tue May 5 19:45:45 CEST 2009


Stephen J. Turnbull wrote:

MRAB writes:

> > I don't think "people shouldn't be using non-ASCII-compatible > > encodings for locale encodings" is a sufficient rationale for a hard > > error here. I mean, of course they should be using UTF-8. Maybe > > Python 3.1 should just go ahead and error on any other encoding on > > POSIX platforms? > > > I don't see why the error handler couldn't in principle be used with > encodings other than UTF-8, although in that case all of the low > surrogates should be open to use. I should have been more clear here, I guess. The error handler can, and in the PEP will be by default, used with all "sane" locale encodings on POSIX. It occurs to me that the PEP maybe should say that it is an error to have your POSIX locale set to UTF-16 or something like that. What "sane" means in this context is 1. ASCII NUL is the bytearray terminator, and can't be used as a byte in a file name. This rules out UTF-16, UTF-32, and widechar EUC encodings, as well as some very rare ones. [snip] It might be slightly OT, but sometimes strict UTF-8 encoding is violated by encoding U+0000 using 2 bytes (0xC0 0x80) so that 0x00 can be used as a terminator. I think I read that Microsoft sometimes does this.



More information about the Python-Dev mailing list