[Python-3000] Unicode and OS strings (original) (raw)

Marcin 'Qrczak' Kowalczyk qrczak at knm.org.pl
Tue Sep 18 01:06:54 CEST 2007


Dnia 16-09-2007, N o godzinie 16:13 +0900, Stephen J. Turnbull napisaƂ(a):

When a codec encounters something it can't handle, whether it's a valid character in a legacy encoding, a private use character in a UTF, or an invalid sequence of code units, it throws an exception specifying the character or code unit and the current coded character set,

Does this mean that this: $ python -c 'import sys; print("%x" % ord(sys.argv[1]))' $(printf "\ue650") would no longer print e650 in a UTF-8 locale, assuming a shell which understands the escape sequence in printf, and the script would have to make special arrangements to make the character available? U+E650 is a private use character.

If so, I'm violently against this.

This definitely requires that the Unicode codecs be modified to do the right thing if they encounter private use characters in the input stream or output stream.

The right thing is to encode or decode private use characters according to regular codec rules, as all other transcoders of these codecs in all other languages do.

-- _("< Marcin Kowalczyk _/ qrczak at knm.org.pl ^^ http://qrnik.knm.org.pl/~qrczak/



More information about the Python-3000 mailing list