[Python-Dev] PEP 263 considered faulty (for some Japanese) (original) (raw)

M.-A. Lemburg mal@lemburg.com
Tue, 12 Mar 2002 14:44:27 +0100


Guido van Rossum wrote:

> Guido> I think I can propose a compromise though: there may be two > Guido> default encodings, one used for Python source code, and one > Guido> for data. [Stephen J. Turnbull] > Why go in this direction? It's better to allow each individual stream > to specify a codec to be implicitly applied, I think. Consider Emacs, > for example, which allows specification of default codecs for (1) file > contents (2) names of file system objects (3) process I/O (but not I > and O and E separately, which has caused problems!) (4) console input > and (5) console output. All of those are plausible candidates for > having separate defaults in Python as well. > > For example, in Japan it's easy to imagine a program with local file > contents defaulting to UTF-8 (for cross-system portability) needing to > access the Windows 9x console and file system in Shift JIS, while > process (eg, network) I/O might be EUC-JP if the server were Unix. > (Yes, I'm straining, but not much.) > > But if you allow codecs for each stream, people who want to have > different defaults for certain classes of stream would just derive > classes which initialized the default codec appropriately. Attaching codecs to streams is currently pretty painful AFAICT (I've never tried it :-), but I think your idea has merit: there are sufficiently many different contexts where an encoding must be specified that it makes sense to allow setting different defaults for the different contexts. The issue of filename encoding is one with which we (well, some of us) have struggled recently. We'd have to think more about which contexts exactly to consider; for now I can come up with: - file I/O; - OS filenames; - implicit mixing of 8-bit and Unicode strings; - invocation of unicode(s) or u.decode() without an encoding. I see your proposal as a possible future generalization of my two-encodings proposal, not as an incimpatible alternative.

My position on this is not to introduce more defaults -- explicit is better than implicit and in this particular case (encodings) it'll result in a net win.

In the light of the post by Atsuo Ishimoto and the responses from both Marc-Andre Lemburg and Martin von Loewis, however, I'm not sure whether Suziki Hisao's response represents the Japanese community, and it's possible that nothing needs to be done.

Well, users using non-ASCII coding in their source files should start to be explicit about the encoding (in phase 1 they'll get a warning printed which makes them aware of the problem), but other than that, I don't see a need for changes to the strategy.

-- Marc-Andre Lemburg CEO eGenix.com Software GmbH


Company & Consulting: http://www.egenix.com/ Python Software: http://www.egenix.com/files/python/