[Python-Dev] PEP 263 considered faulty (for some Japanese) (original) (raw)

Guido van Rossum guido@python.org
Tue, 12 Mar 2002 08:27:16 -0500


Guido> I think I can propose a compromise though: there may be two Guido> default encodings, one used for Python source code, and one Guido> for data.

[Stephen J. Turnbull]

Why go in this direction? It's better to allow each individual stream to specify a codec to be implicitly applied, I think. Consider Emacs, for example, which allows specification of default codecs for (1) file contents (2) names of file system objects (3) process I/O (but not I and O and E separately, which has caused problems!) (4) console input and (5) console output. All of those are plausible candidates for having separate defaults in Python as well.

For example, in Japan it's easy to imagine a program with local file contents defaulting to UTF-8 (for cross-system portability) needing to access the Windows 9x console and file system in Shift JIS, while process (eg, network) I/O might be EUC-JP if the server were Unix. (Yes, I'm straining, but not much.) But if you allow codecs for each stream, people who want to have different defaults for certain classes of stream would just derive classes which initialized the default codec appropriately.

Attaching codecs to streams is currently pretty painful AFAICT (I've never tried it :-), but I think your idea has merit: there are sufficiently many different contexts where an encoding must be specified that it makes sense to allow setting different defaults for the different contexts. The issue of filename encoding is one with which we (well, some of us) have struggled recently.

We'd have to think more about which contexts exactly to consider; for now I can come up with:

I see your proposal as a possible future generalization of my two-encodings proposal, not as an incimpatible alternative.

In the light of the post by Atsuo Ishimoto and the responses from both Marc-Andre Lemburg and Martin von Loewis, however, I'm not sure whether Suziki Hisao's response represents the Japanese community, and it's possible that nothing needs to be done.

--Guido van Rossum (home page: http://www.python.org/~guido/)