[Python-Dev] UCS2/UCS4 default (original) (raw)

Jeroen Ruigrok van der Werven asmodai at in-nomine.org
Thu Jul 3 16:46:48 CEST 2008

Previous message: [Python-Dev] UCS2/UCS4 default
Next message: [Python-Dev] UCS2/UCS4 default
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

-On [20080703 15:58], Guido van Rossum (guido at python.org) wrote:

Your seem to be suggesting that len(u"\U00012345") should return 1 on a system that internally uses UTF-16 and hence represents this string as a surrogate pair.

From a Unicode and UTF-16 point of view that makes the most sense. So yes, I am suggesting that.

This is not going to happen. You may as well complain to the authors of the Java standard about the corresponding problem there.

Why would I need to complain to them? They already fixed it since 1.5.0.

Java 1.5.0's release notes (http://java.sun.com/developer/technicalArticles/releases/j2se15/):

Supplementary Character Support

32-bit supplementary character support has been carefully added to the platform as part of the transition to Unicode 4.0 support. Supplementary characters are encoded as a special pair of UTF16 values to generate a different character, or codepoint. A surrogate pair is a combination of a high UTF16 value and a following low UTF16 value. The high and low values are from a special range of UTF16 values.

In general, when using a String or sequence of characters, the core API libraries will transparently handle the new supplementary characters for you.

The methods that accept an int value support all Unicode characters, including supplementary characters. For example, Character.isLetter(0x2F81A) returns true because the code point value represents a letter (a CJK ideograph).

-- Jeroen Ruigrok van der Werven <asmodai(-at-)in-nomine.org> / asmodai イェルーンラウフロックヴァンデルウェルヴェン http://www.in-nomine.org/ | http://www.rangaku.org/ | GPG: 2EAC625B Life can only be understood backwards, but it must be lived forwards...

Previous message: [Python-Dev] UCS2/UCS4 default
Next message: [Python-Dev] UCS2/UCS4 default
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the Python-Dev mailing list