[Python-Dev] bytes (original) (raw)

[Python-Dev] bytes / unicode

Terry Reedy tjreedy at udel.edu
Tue Jun 22 22:19:45 CEST 2010


On 6/22/2010 1:22 AM, Glyph Lefkowitz wrote:

The thing that I have heard in passing from a couple of folks with experience in this area is that some older software in asia would present characters differently if they were originally encoded in a "japanese" encoding versus a "chinese" encoding, even though they were really "the same" characters.

As I tried to say in another post, that to me is similar to wanting to present English text is different fonts depending on whether spoken by an American or Brit, or a modern person versus a Renaissance person.

I do know that Han Unification is a giant political mess (<http://en.wikipedia.org/wiki/Hanunification> makes for some

Thanks, I will take a look.

interesting reading), but my understanding is that it has handled enough of the cases by now that one can write software to display asian languages and it will basically work with a modern version of unicode. (And of course, there's always the private use area, as Stephen Turnbull pointed out.)

Regardless, this is another example where keeping around a string isn't really enough. If you need to display a japanese character in a distinct way because you are operating in the japanese script, you need a tag surrounding your data that is a hint to its presentation. The fact that these presentation hints were sometimes determined by their encoding is an unfortunate historical accident.

Yes. The asian languages I know anything about seems to natively have almost none of the symbols English has, many borrowed from math, that have been pressed into service for text markup.

-- Terry Jan Reedy



More information about the Python-Dev mailing list