[Python-Dev] transform() and untransform() methods, and the codec registry (original) (raw)

Victor Stinner victor.stinner at haypocalc.com
Sun Dec 5 23:25:27 CET 2010


On Saturday 04 December 2010 09:31:04 you wrote:

Alexander Belopolsky writes: > In fact, once the language moratorium is over, I will argue that > str.encode() and byte.decode() should deprecate encoding argument and > just do UTF-8 encoding/decoding. Hopefully by that time most people > will forget that other encodings exist. (I can dream, right?)

It's just a dream. There's a pile of archival material, often on R/O media, out there that won't be transcoded any more quickly than the inscriptions on Tutankhamun's tomb.

Not only, many libraries expect use bytes arguments encoded to a specific encoding (eg. locale encoding). Said differenlty, only few libraries written in C accept wchar* strings.

The Linux kernel (or many, or all, UNIX/BSD kernels) only manipulate byte strings. The libc only accept wide characters for a few operations. I don't know how to open a file with an unicode path with the Linux libc: you have to encode it...

Alexander: you should first patch all UNIX/BSD kernels to use unicode everywhere, then patch all libc implementations, and then all libraries (written in C). After that, you can have a break.

Victor



More information about the Python-Dev mailing list