[Python-Dev] PEP 263 - default encoding (original) (raw)
Guido van Rossum guido@python.org
Fri, 15 Mar 2002 14:39:05 -0500
- Previous message: [Python-Dev] PEP 263 - default encoding
- Next message: [Python-Dev] PEP 263 - default encoding
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
a. Does this really make sense for UTF-16? It looks to me like a great way to induce bugs of the form "write a unicode literal containing 0x0A, then translate it to raw form by stripping the u prefix."
Of course not. I don't expect anyone to put UTF-16 in their source encoding cookie. But should we bother making a list of encodings that shouldn't be used?
b. No editor is likely to implement correct display to distinguish between u"" and just "".
That's fine. Given phase 2, the editor should display the entire file using the encoding given in the cookie, despite that phase 1 only applies the encoding to u"" literals. The rest of the file is supposed to be ASCII, and if it isn't, that's the user's problem.
c. This definitely breaks Emacs coding cookie semantics. Emacs applies the coding cookie to the whole buffer. I don't see a way to lose offhand, but this is sufficiently subtle that I don't want to break my head trying to prove that you can't lose, either.
I wouldn't worry about that, see above.
d. You probably have to deprecate ISO 2022 7-bit coding systems, too, because people will try to get the representation of a string by inputting a raw string in coded form. This might contain a quote character.
Good point. This sounds like a documentation issue at worst.
e. This causes problems for UTF-8 transition, since people will want to put arbitrary byte strings in a raw string.
I'm not sure I understand. What do you call a raw string? Do you mean an r"" literal? Why would people want to use that for arbitrary binary data? Arbitrary binary data should always be encoded using \xDD hex or \OOO octal escapes.
But these will not be legal UTF-8 files, even though they have a UTF-8 coding cookie. People who are trying to do the right thing will have the rules changed again later, most likely.
If you're trying to do the right thing you shouldn't be putting arbitrary binary data in any string literal.
This means that until editors reliably implement b. and similar features, developers must change coding systems to type raw strings and Unicode strings.
Sounds like a YAGNI to me.
--Guido van Rossum (home page: http://www.python.org/~guido/)
- Previous message: [Python-Dev] PEP 263 - default encoding
- Next message: [Python-Dev] PEP 263 - default encoding
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]