[Python-3000] Unicode and OS strings (original) (raw)

Stephen J. Turnbull stephen at xemacs.org
Wed Sep 19 07:00:51 CEST 2007


James Y Knight writes:

iso-2022 or some other abomination. This has upsides (simple, doesn't
trample on PUA codepoints, only needs one new codec, never throws
exception in the above example, and really is correct much of the
time), and downsides (if the system locale is iso-2022, and all the
filenames you're dealing with really are also properly encoded in
iso-2022, it might be nice if they decoded into the sensible unicode
string, instead of a non-sensical (but still round-trippable) one.

ISO 2022, like Unicode, is an extensible standard. Corporate character sets in Asia extend, but are not easy to distinguish from each other though they often conflict. They're not proper in the sense that they abuse the registered final bytes of the national standards they're based on, but it's also not reasonable for those of us who live there to ignore them.

I think the advantages outweigh the disadvantages, but the world I
live in, using anything other than UTF8 or ASCII is grounds for entry
into an insane asylum. ;)

You're very fortunate. In the world I live in, Shift JIS, which isn't even ISO 2022 compatible, is mandated by a power higher even than the Borg of Redmond: the telephone company.



More information about the Python-3000 mailing list