[Python-Dev] transform() and untransform() methods, and the codec registry (original) (raw)
Victor Stinner victor.stinner at haypocalc.com
Sun Dec 5 23:25:27 CET 2010
- Previous message: [Python-Dev] transform() and untransform() methods, and the codec registry
- Next message: [Python-Dev] transform() and untransform() methods, and the codec registry
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Saturday 04 December 2010 09:31:04 you wrote:
Alexander Belopolsky writes: > In fact, once the language moratorium is over, I will argue that > str.encode() and byte.decode() should deprecate encoding argument and > just do UTF-8 encoding/decoding. Hopefully by that time most people > will forget that other encodings exist. (I can dream, right?)
It's just a dream. There's a pile of archival material, often on R/O media, out there that won't be transcoded any more quickly than the inscriptions on Tutankhamun's tomb.
Not only, many libraries expect use bytes arguments encoded to a specific encoding (eg. locale encoding). Said differenlty, only few libraries written in C accept wchar* strings.
The Linux kernel (or many, or all, UNIX/BSD kernels) only manipulate byte strings. The libc only accept wide characters for a few operations. I don't know how to open a file with an unicode path with the Linux libc: you have to encode it...
Alexander: you should first patch all UNIX/BSD kernels to use unicode everywhere, then patch all libc implementations, and then all libraries (written in C). After that, you can have a break.
Victor
- Previous message: [Python-Dev] transform() and untransform() methods, and the codec registry
- Next message: [Python-Dev] transform() and untransform() methods, and the codec registry
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]