[I18n-sig] Re: [Python-Dev] Unicode debate (original) (raw)

Paul Prescod paul@prescod.net
Tue, 02 May 2000 11:25:33 -0500


Guido van Rossum wrote:

Aha, then we'll see u == v even though type(u) is type(v) and len(u) != len(v). /F's world will collapse. :-)

There are many levels of equality that are interesting. I don't think we would move to grapheme equivalence until "the rest of the world" (XML, Java, W3C, SQL) did.

If we were going to move to grapheme equivalence (some day), the right way would be to normalize characters in the construction of the Unicode string. This is known as "Early normalization":

http://www.w3.org/TR/charmod/#NormalizationApplication

-- Paul Prescod - ISOGEN Consulting Engineer speaking for himself It's difficult to extract sense from strings, but they're the only communication coin we can count on. - http://www.cs.yale.edu/~perlis-alan/quotes.html