[Python-3000] PEP 3112 (original) (raw)

"Martin v. Löwis" martin at v.loewis.de
Sun May 6 09:47:12 CEST 2007


I just read PEP 3112, and I believe it contains a flaw/underspecification.

It says

Each shortstringchar or longstringchar must be a character between 1

and 127 inclusive, regardless of any encoding declaration [2] in the

source file.

What does that mean? In particular, what is "a character between 1 and 127"?

Assuming this refers to ordinal values in some encoding: what encoding? It's particularly puzzling that it says "regardless of any encoding declaration of the source file".

I fear (but hope that I'm wrong) that this was meant to mean "use the bytes as they are stored on disk in the source file". If so: is the attached file valid Python? In case your editor can't render it: it reads

#! -- coding: iso-2022-jp -- a = b"Питон"

But if you look at the file with a hex editor, you see it contains only bytes between 1 and 127.

I would hope that this code is indeed ill-formed (i.e. that the byte representation on disk is irrelevant, and only the Unicode ordinals of the source characters matter)

If so, can the specification please be updated to clarify that

  1. in Grammar changes: Each shortstringchar or longstringchar must be a character whose Unicode ordinal value is between 1 and 127 inclusive.
  2. in Semantics: The bytes in the new object are obtained as if encoding a string literal with "iso-8859-1"

Regards, Martin

-------------- next part -------------- A non-text attachment was scrubbed... Name: a.py Type: text/x-python Size: 55 bytes Desc: not available Url : http://mail.python.org/pipermail/python-3000/attachments/20070506/c0269ce4/attachment.py



More information about the Python-3000 mailing list