[Python-Dev] unicode hell/mixing str and unicode as dictionary keys (original) (raw)
"Martin v. Löwis" martin at v.loewis.de
Mon Aug 7 17:41:39 CEST 2006
- Previous message: [Python-Dev] unicode hell/mixing str and unicode as dictionary keys
- Next message: [Python-Dev] unicode hell/mixing str and unicode as dictionary keys
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
David Hopwood schrieb:
I disagree. Unicode strings should always be considered distinct from non-ASCII byte strings. Implicitly encoding or decoding in order to perform a comparison is a bad idea; it is expensive and will often do the wrong thing.
That's a pretty irrelevant position at this point; Python has had the notion of a system encoding since Unicode was introduced, and we are not going to remove that just before a release candidate of Python 2.5.
The question at hand is not whether certain object should compare unequal, but whether comparing them should raise an exception.
Which of the two conversions is selected is arbitrary; [...] It would not be arbitrary. In the common case where the byte encoding uses "precomposed" characters, using "U.encode(systemencoding) == B" will tend to succeed in more cases than "B.decode(systemencoding) == U", because alternative representations of the same abstract character in Unicode will be mapped to the same precomposed character.
No, they won't (although they should, perhaps):
py> u'o\u0308'.encode("latin-1") Traceback (most recent call last): File "", line 1, in ? UnicodeEncodeError: 'latin-1' codec can't encode character u'\u0308' in position 1: ordinal not in range(256)
In addition, it's also possible to find encodings (e.g. iso-2022) where different byte sequences decode to the same Unicode string.
Regards, Martin
- Previous message: [Python-Dev] unicode hell/mixing str and unicode as dictionary keys
- Next message: [Python-Dev] unicode hell/mixing str and unicode as dictionary keys
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]