[Python-3000] PEP 3112 (original) (raw)
Guido van Rossum guido at python.org
Mon May 7 19:45:40 CEST 2007
- Previous message: [Python-3000] PEP 3112
- Next message: [Python-3000] PEP 3112
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On 5/6/07, "Martin v. Löwis" <martin at v.loewis.de> wrote:
I just read PEP 3112, and I believe it contains a flaw/underspecification.
It says # Each shortstringchar or longstringchar must be a character between 1 # and 127 inclusive, regardless of any encoding declaration [2] in the # source file. What does that mean? In particular, what is "a character between 1 and 127"? Assuming this refers to ordinal values in some encoding: what encoding? It's particularly puzzling that it says "regardless of any encoding declaration of the source file". I fear (but hope that I'm wrong) that this was meant to mean "use the bytes as they are stored on disk in the source file". If so: is the attached file valid Python? In case your editor can't render it: it reads #! -- coding: iso-2022-jp -- a = b"Питон" But if you look at the file with a hex editor, you see it contains only bytes between 1 and 127. I would hope that this code is indeed ill-formed (i.e. that the byte representation on disk is irrelevant, and only the Unicode ordinals of the source characters matter) If so, can the specification please be updated to clarify that 1. in Grammar changes: Each shortstringchar or longstringchar must be a character whose Unicode ordinal value is between 1 and 127 inclusive. 2. in Semantics: The bytes in the new object are obtained as if encoding a string literal with "iso-8859-1"
Sounds like a good fix to me; I agree that bytes literals, like Unicode literals, should not vary depending on the source encoding. In step 2, can't you use "ascii" as the encoding?
-- --Guido van Rossum (home page: http://www.python.org/~guido/)
- Previous message: [Python-3000] PEP 3112
- Next message: [Python-3000] PEP 3112
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]