[Python-Dev] Unicode debate (original) (raw)

Paul Prescod paul@prescod.net
Tue, 02 May 2000 11:05:20 -0500


Neil, I sincerely appreciate your informed input. I want to emphasize one ideological difference though. :)

Neil Hodgson wrote:

... The two options being that literal is either assumed to be encoded in Latin-1 or UTF-8.

I reject that characterization.

I claim that both strings contain Unicode characters but one can contain Unicode charactes with higher digits. UTF-8 versus latin-1 does not enter into it. Python strings should not be documented in terms of encodings any more than Python ints are documented in terms of their two's complement representation. Then we could describe the default conversion from integers to floats in terms of their bit-representation. Ugh!

I accept that the effect is similar to calling Latin-1 the "default" that's a side effect of the simple logical model that we are proposing.

-- Paul Prescod - ISOGEN Consulting Engineer speaking for himself It's difficult to extract sense from strings, but they're the only communication coin we can count on. - http://www.cs.yale.edu/~perlis-alan/quotes.html