[Python-Dev] Normalizing unicode? (was: Re: test_unicode_file failing on Mac OS X) (original) (raw)
Guido van Rossum guido at python.org
Wed Dec 10 12:39:16 EST 2003
- Previous message: [Python-Dev] Normalizing unicode? (was: Re: test_unicode_file failing on Mac OS X)
- Next message: [Python-Dev] Normalizing unicode? (was: Re: test_unicode_file failing on Mac OS X)
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Before we start considering how it's possible to make unicode.equal act encoding-insensitively[1], I think we need to consider whether that's really the behavior we want. In some ways, this seems like case-insensitive equality to me: it's certainly a useful operation, but I don't think it should be the object's builtin notion of equality.. - I think people will be confused if s1==s2 but s1[0]!=s2[0]. - Sometimes you might want to distinguish different encodings of the "same" string; a "normalized" equality test makes that very difficult.
Right. Couldn't have said it better myself.
And if you do want unicode objects to act normalized, then I think that the right way to do it is to normalize them at creation time. Then all the right hash/eq/cmp stuff just falls out.
Exactly.
But since some people will may want to distinguish different encodings of the same string, I think that the most sensible alternative is to add a new subclass to unicode -- something like "normalizedunicode." It would normalize itself at construction time; and when combined with other unicode strings (eg by +), the result would be normalized (so unicode+normalizedunicode -> normalizedunicode). It's possible that the normalized unicode class would be more useful to people (and therefore more widely used?), but the non-normalized version would still be available for people who want it.
Works for me. I recomment that someone try this approach as a user subclass first -- this should be easy enough, right?
(or we could just leave things as they are now, and force people to do any normalization themselves. :) )
Do we even have normalization code in core Python?
--Guido van Rossum (home page: http://www.python.org/~guido/)
- Previous message: [Python-Dev] Normalizing unicode? (was: Re: test_unicode_file failing on Mac OS X)
- Next message: [Python-Dev] Normalizing unicode? (was: Re: test_unicode_file failing on Mac OS X)
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]