[Python-Dev] unicode hell/mixing str and unicode as dictionary keys (original) (raw)
"Martin v. Löwis" martin at v.loewis.de
Mon Aug 7 14:46:45 CEST 2006
- Previous message: [Python-Dev] unicode hell/mixing str and unicode as dictionary keys
- Next message: [Python-Dev] unicode hell/mixing str and unicode as dictionary keys
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
M.-A. Lemburg schrieb:
There's no disputing that an exception should be raised if the string must be interpretable as characters in order to continue. But that's not true here if you allow for the interpretation that they're simply objects of different (duck) type and therefore unequal. Hmm, given that interpretation, 1 == 1.0 would have to be False.
No, but 1 == 1.5 would have to be False (and actually is). In that analogy, int relates to float as ascii-bytes to Unicode: some values are shared between int and float (e.g. 1 and 1.0), other values are not shared (e.g. 1.5 has no equivalent in int). An int equals a float only if both values originate from the shared subset.
Now, int is a (nearly) true subset of float, so there are no ints with no float equivalent (actually, there are, but Python ignores that).
Note that you do have to interpret the string as characters if you compare it to Unicode and there's nothing wrong with that.
Consider this: py> int(3+4j) Traceback (most recent call last): File "", line 1, in ? TypeError: can't convert complex to int; use int(abs(z)) py> 3 == 3+4j False
So even though the conversion raises an exception, the values are determined to be not equal. Again, because int is a nearly true subset of complex, the conversion goes the other way, but if it would use the complex->int conversion, then the TypeError should be taken as a guarantee that the objects don't compare equal.
Expanding this view to Unicode should mean that a unicode string U equals a byte string B if U.encode(system_encode) == B or B.decode(system_encoding) == U, and that they don't equal otherwise (e.g. if the conversion fails with a "not convertible" exception). Which of the two conversions is selected is arbitrary; we should, of course, continue to use the one we always used (for "ascii", there is no difference between the two).
Regards, Martin
- Previous message: [Python-Dev] unicode hell/mixing str and unicode as dictionary keys
- Next message: [Python-Dev] unicode hell/mixing str and unicode as dictionary keys
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]