[Python-Dev] Re: [I18n-sig] Re: Unicode debate (original) (raw)

Just van Rossum just@letterror.com
Tue, 2 May 2000 16:42:24 +0100


[Just]

You're going to have a hard time explaining that "\377" != u"\377". [GvR] I agree. You are an example of how hard it is to explain: you still don't understand that for a person using CJK encodings this is in fact the truth.

That depends on the definition of truth: it you document that 8-bit strings are Latin-1, the above is the truth. Conceptually classify all other 8-bit encodings as binary goop makes the semantics chrystal clear.

Again, if you define that "all strings are unicode" and that 8-bit strings contain Unicode characters up to 255, you're all set. Clear semantics, few surprises, simple implementation, etc. etc. But not all 8-bit strings occurring in programs are Unicode. Ask Moshe.

I know. They can be anything, even binary goop. But that's only an artifact of the fact that 8-bit strings need to double as buffer objects.

Just