[Python-Dev] bytes type discussion (original) (raw)

Adam Olsen rhamph at gmail.com
Wed Feb 15 09:39:10 CET 2006

Previous message: [Python-Dev] bytes type discussion
Next message: [Python-Dev] bytes type discussion
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 2/15/06, "Martin v. Löwis" <martin at v.loewis.de> wrote:

Adam Olsen wrote: > (I wonder if maybe they should be an error in 2.x as well. Source > encoding is for unicode literals, not str literals.)

Source encoding applies to the entire source code, including (byte) string literals, comments, identifiers, and keywords. IOW, if you declare your source encoding is utf-8, the keyword "print" must be represented with the bytes that represent the Unicode letters for "p","r","i","n", and "t" in UTF-8.

Although it does apply to the entire source file, I think this is more for convenience (try telling an editor that only a single line is Shift_JIS!) than to allow 8-bit (or 16-bit?!) str literals. Indeed, you could have arbitrary 8-bit str literals long before the source encoding was added. Keywords and identifiers continue to be limited to ascii characters (even if they make a roundtrip through other encodings), and comments continue to be ignored.

Source encoding exists so that you can write u"123" with the encoding stated once at the top of the file, rather than "123".decode('utf-8') with the encoding repeated everywhere.

Making it an error to have 8-bit str literals in 2.x would help educate the user that they will change behavior in 3.0 and not be 8-bit str literals anymore.

-- Adam Olsen, aka Rhamphoryncus

Previous message: [Python-Dev] bytes type discussion
Next message: [Python-Dev] bytes type discussion
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the Python-Dev mailing list