[Python-Dev] UCS2/UCS4 default (original) (raw)

Jeroen Ruigrok van der Werven asmodai at in-nomine.org
Thu Jul 3 19:35:45 CEST 2008

Previous message: [Python-Dev] UCS2/UCS4 default
Next message: [Python-Dev] UCS2/UCS4 default
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

-On [20080703 19:21], Adam Olsen (rhamph at gmail.com) wrote:

On Thu, Jul 3, 2008 at 7:57 AM, M.-A. Lemburg <mal at egenix.com> wrote:

Please remember that lone surrogate pair code points are perfectly valid Unicode code points, nevertheless. Just as a lone combining code point is valid on its own. That is a big part of these problems. For all practical purposes, a surrogate is like a UTF-8 code unit, and must be handled the same way, so why the heck do they confuse everybody by saying "oh, it's a code point too!"?

Because surrogate code points are not Unicode scalar values, isolated UTF-16 code units in the range 0xd800-0xdfff are ill-formed. (D91 from Unicode 5.0/5.1, section 3.9)

So, no, it is not a code point too.

-- Jeroen Ruigrok van der Werven <asmodai(-at-)in-nomine.org> / asmodai イェルーンラウフロックヴァンデルウェルヴェン http://www.in-nomine.org/ | http://www.rangaku.org/ | GPG: 2EAC625B Als men blijft geloven kan de zwaarste steen niet zinken...

Previous message: [Python-Dev] UCS2/UCS4 default
Next message: [Python-Dev] UCS2/UCS4 default
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the Python-Dev mailing list