[Python-Dev] Normalizing unicode? (original) (raw)

Michael Hudson mwh at python.net
Thu Dec 11 05:34:37 EST 2003


Edward Loper <edloper at gradient.cis.upenn.edu> writes:

Scott David Daniels wrote:

I naïvely wrote: >Could we perhaps use a comparison that, in effect, did: > def uniequal(first, second): > if first == second: > return True > return first.normalize() == second.normalize() >That is, take advantage of the fact that normalization is often >unnecessary for "trivial" reasons. [...] Before we start considering how it's possible to make unicode.equal act encoding-insensitively[1], I think we need to consider whether that's really the behavior we want. In some ways, this seems like case-insensitive equality to me: it's certainly a useful operation, but I don't think it should be the object's builtin notion of equality.. - I think people will be confused if s1==s2 but s1[0]!=s2[0]. - Sometimes you might want to distinguish different encodings of the "same" string; a "normalized" equality test makes that very difficult.

In general it seems to me that == should, given a choice, err on the side of being an overly tight equivalence relation -- i.e. return True less often.

Cheers, mwh

-- 81. In computing, turning the obvious into the useful is a living definition of the word "frustration". -- Alan Perlis, http://www.cs.yale.edu/homes/perlis-alan/quotes.html



More information about the Python-Dev mailing list