[I18n-sig] Re: [Python-Dev] Unicode debate (original) (raw)

Just van Rossum just@letterror.com
Tue, 2 May 2000 14:44:30 +0100


At 8:30 AM -0400 02-05-2000, Guido van Rossum wrote:

I think /F's point was that the Unicode standard prescribes different behavior here: for UTF-8, a missing or lone continuation byte is an error; for Unicode, accents are separate characters that may be inserted and deleted in a string but whose display is undefined under certain conditions.

(I just noticed that this doesn't work in Tkinter but it does work in wish. Strange.)

FYI: Normalization is needed to make comparing Unicode strings robust, e.g. u"=C8" should compare equal to u"e\u0301". Aha, then we'll see u =3D=3D v even though type(u) is type(v) and len(u) !=3D len(v). /F's world will collapse. :-)

Does the Unicode spec really specifies u should compare equal to v? This behavior would be the responsibility of a layout engine, a role which is way beyond the scope of Unicode support in Python, as it is language- and script-dependent.

Just