[Python-3000] Unicode and OS strings (original) (raw)
Stephen J. Turnbull stephen at xemacs.org
Tue Sep 18 06:56:37 CEST 2007
- Previous message: [Python-3000] Unicode and OS strings
- Next message: [Python-3000] Unicode and OS strings
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
"Marcin 'Qrczak' Kowalczyk" <qrczak at knm.org.pl> writes:
When a codec encounters something it can't handle, whether it's a valid character in a legacy encoding, a private use character in a UTF, or an invalid sequence of code units, it throws an exception specifying the character or code unit and the current coded character set,
Does this mean that this: $ python -c 'import sys; print("%x" % ord(sys.argv[1]))' $(printf "\ue650") would no longer print e650 in a UTF-8 locale
What do you mean "no longer"? Look:
chibi:MacPorts steve$ export LC_ALL=en_US.UTF-8 chibi:MacPorts steve$ python -c 'import sys; print("%s" % sys.argv[1])' $(printf "\ue650") \ue650 chibi:MacPorts steve$ python -c 'import sys; print("%x" % ord(sys.argv[1]))' $(printf "\ue650") Traceback (most recent call last): File "", line 1, in ? TypeError: ord() expected a character, but string of length 6 found chibi:MacPorts steve$
Note that some people are currently arguing that sys.argv should be an array of bytes objects, and Guido has not yet said "no". In that case, all of the current proposals should have exactly this result.
My position is that if you do something that depends on the internal representation of implementation-dependent objects, you deserve whatever results you get.
- Previous message: [Python-3000] Unicode and OS strings
- Next message: [Python-3000] Unicode and OS strings
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]