[Python-Dev] unicode hell/mixing str and unicode as dictionary keys (original) (raw)

Ron Adam rrr at ronadam.com
Mon Aug 7 23:37:42 CEST 2006


Michael Foord wrote:

David Hopwood wrote:[snip..]

we should, of course, continue to use the one we always used (for "ascii", there is no difference between the two).

+1 This seems the most (only ?) logical solution. No; always considering Unicode and non-ASCII byte strings to be distinct is just as logical.

Yes, that's true. (But can't be done prior to P3k of course.) Consider the comparison of ...

[3] == (3,)   ->  False

These are not the same thing even though it may be trivial to treat them as being equivalent. So how smart should a equivalence comparison be? I think testing for interchangeability and/or taking into account context is going down a very difficult road. Which is what the string to Unicode comparison does by making an assumption that the string type is in the default encoding, which it may not be.

Purity in this would insist that comparing floats and integers always return False, but there is little ambiguity when it comes to whether numerical values are equivalent or not. The rules for their comparisons are fairly well established. So numerical equivalence can be the exception when comparing values of differing types and its the expected behavior as well as the established practice in programming.

Except there has been an implicit promise in Python for years now that ascii byte-strings will compare equally to the unicode equivalent: lots of code assumes this. Breaking this is fine in principle - but for Py3K not Py 2.x.

Also True. And I hope that a bytes to Unicode comparison in Py3k will always returns False just like [3] == (3,) always returns False.

That means Martin's solution is the best for the current problem. (IMHO of course...)

I think (IMHO) in this particular case, maintaining "backwards compatibility" should take precedence (until Py3k) and be the stated reason for the continued behavior in the documents as well. And so Unicode to String comparisons should be the second exception to not doing data form conversions when comparing two objects. At least for pre-Py3k.

Are there other cases where different types of objects compare equal? (Not including those where the user writes or overrides a method to get that functionality of course.)

Cheers, Ron



More information about the Python-Dev mailing list