[Python-Dev] transform() and untransform() methods, and the codec registry (original) (raw)

Stephen J. Turnbull stephen at xemacs.org
Sat Dec 4 09:31:04 CET 2010


Alexander Belopolsky writes:

In fact, once the language moratorium is over, I will argue that str.encode() and byte.decode() should deprecate encoding argument and just do UTF-8 encoding/decoding. Hopefully by that time most people will forget that other encodings exist. (I can dream, right?)

It's just a dream. There's a pile of archival material, often on R/O media, out there that won't be transcoded any more quickly than the inscriptions on Tutankhamun's tomb.

Remember, Python is a language used to implement such translations. It's not an application. I think it would be reasonable to make UTF-8 the default encoding on all platforms, except for internal OS functions, where Windows will presumably continue to use UTF-16 and *nix distros will probably continue to agree to disagree about whether on-disk format is NFD or NFC (and the Python language as yet doesn't know about NFC v. NFD, although the library does).

In the discussion of PEP 263, I proposed that the external encoding of Python scripts themselves be fixed as UTF-8, and other encodings would have to be pretranslated by an appropriate codec. That was shouted down by the European contingent, who wanted to continue using Latin-1 and Latin-2 without codecs or a wrapper to call them transparently. However, this time around you might get a more sympathetic hearing.



More information about the Python-Dev mailing list