[Python-Dev] PEP 263 considered faulty (for some Japanese) (original) (raw)

M.-A. Lemburg mal@lemburg.com
Tue, 12 Mar 2002 10:02:52 +0100

Previous message: [Python-Dev] PEP 263 considered faulty (for some Japanese)
Next message: [Python-Dev] Fw: [boost] GDTL - Time Library (Version 052) -- Request for review
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

SUZUKI Hisao wrote:

I am a Japanese fan/developer/user of Python for years. I have recently read the PEP 263 --- Defining Python Source Code Encodings. I have been discussing about it on the Japanese mailing list of Python last week, and I and others found a severe fault in it. I have also read the Parade of the PEPs and know that it is very close to being checked in, so I am writing this message to you in English in a hurry. The PEP 263, as is, will damage the usability of Python in Japan.

I certainly hope not since the PEP was specifically invented to address those parts of the world which do not use ASCII or Latin-1 as common encoding.

Reading your comments, though, I believe that the PEP actually does help in your case too:

All you have to do is be explicit in the coding header of a source file rather than no using such a header at all.

So in the end, you have to change one line per Python source script, telling the interpreter what encoding the file uses and your done.

Even though this requires a bit of work, in the end, I believe that it is a net win, since you no longer have to maintain magic data about the file via some other means.

The PEP says, "Just as in coercion of strings to Unicode, Python will default to the interpreter's default encoding (which is ASCII in standard Python installations) as standard encoding if no other encoding hints are given." This will let many English people free from writing the magic comment to their scripts explicitly. However, many Japanese set the default encoding other than ASCII (we use multi-byte encodings for daily use, not as luxury), and some Japanese set it, say, "utf-16".

This only applies if the interpreter does not find a coding header.

Strange enough, I changed the above lines in the PEP to meet the demands of a Japanese Python user, who uses two Japanese encodings on two different platforms: They have the problem that they use CVS for the code and thus can only have one coding header. One solution was to not use the encoding header and set the default encoding depending on the platform they run the code on. Another solution involved a magic codec which determines its encoding on a per-platform basis -- luckily the Python codec registry is easily extendable so this doesn't pose much of a problem.

BTW, using UTF-16 as default is a particularly bad choice... you might as well stick to all Unicode then since Python uses UCS-2 as internal storage format on narrow builds.

I hope this clarifies your concerns.

-- Marc-Andre Lemburg CEO eGenix.com Software GmbH

Company & Consulting: http://www.egenix.com/ Python Software: http://www.egenix.com/files/python/

Previous message: [Python-Dev] PEP 263 considered faulty (for some Japanese)
Next message: [Python-Dev] Fw: [boost] GDTL - Time Library (Version 052) -- Request for review
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]