[Python-Dev] transform() and untransform() methods, and the codec registry (original) (raw)

Nick Coghlan ncoghlan at gmail.com
Mon Dec 6 05:25:30 CET 2010


On Mon, Dec 6, 2010 at 8:25 AM, Victor Stinner <victor.stinner at haypocalc.com> wrote:

Not only, many libraries expect use bytes arguments encoded to a specific encoding (eg. locale encoding). Said differenlty, only few libraries written in C accept wchar* strings.

The Linux kernel (or many, or all, UNIX/BSD kernels) only manipulate byte strings. The libc only accept wide characters for a few operations. I don't know how to open a file with an unicode path with the Linux libc: you have to encode it... Alexander: you should first patch all UNIX/BSD kernels to use unicode everywhere, then patch all libc implementations, and then all libraries (written in C). After that, you can have a break.

Slightly less ambitious is to get them all to agree to standardise on UTF-8 as the encoding mechanism (which is actually in the process of happening, it just has a long way to go).

However, as a glue language, it is part of Python's role to help manage the transition from legacy encodings to UTF-8, so it will be a very long time before the idea of removing support for the encoding argument becomes even remotely feasible.

Cheers, Nick.

-- Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia



More information about the Python-Dev mailing list