[Python-Dev] UCS2/UCS4 default (original) (raw)
M.-A. Lemburg mal at egenix.com
Thu Jul 3 21:16:03 CEST 2008
- Previous message: [Python-Dev] UCS2/UCS4 default
- Next message: [Python-Dev] UCS2/UCS4 default
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On 2008-07-03 19:35, Jeroen Ruigrok van der Werven wrote:
-On [20080703 19:21], Adam Olsen (rhamph at gmail.com) wrote:
On Thu, Jul 3, 2008 at 7:57 AM, M.-A. Lemburg <mal at egenix.com> wrote:
Please remember that lone surrogate pair code points are perfectly valid Unicode code points, nevertheless. Just as a lone combining code point is valid on its own. That is a big part of these problems. For all practical purposes, a surrogate is like a UTF-8 code unit, and must be handled the same way, so why the heck do they confuse everybody by saying "oh, it's a code point too!"? Because surrogate code points are not Unicode scalar values, isolated UTF-16 code units in the range 0xd800-0xdfff are ill-formed. (D91 from Unicode 5.0/5.1, section 3.9)
True. They are not valid UTF-16 code units, but a code unit is just a storage byte representation of a Unicode tranformation...
""" Code Unit. The minimal bit combination that can represent a unit of encoded text for processing or interchange. The Unicode Standard uses 8-bit code units in the UTF-8 encoding form, 16-bit code units in the UTF-16 encoding form, and 32-bit code units in the UTF-32 encoding form. (See definition D77 in Section 3.9, Unicode Encoding Forms.) """
That's not the same thing as a code point which is an assignment of a slot in the Unicode character set...
""" Code Point. Any value in the Unicode codespace; that is, the range of integers from 0 to 10FFFF16. (See definition D10 in Section 3.4, Characters and Encoding.) """
Reference: http://www.unicode.org/glossary/
Also see Chapter 3.4 (http://www.unicode.org/versions/Unicode5.0.0/ch03.pdf#G2212):
""" Surrogate code points and noncharacters are considered assigned code points, but not assigned characters. """
-- Marc-Andre Lemburg eGenix.com
Professional Python Services directly from the Source (#1, Jul 03 2008)
Python/Zope Consulting and Support ... http://www.egenix.com/ mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
2008-07-07: EuroPython 2008, Vilnius, Lithuania 3 days to go
:::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! ::::
eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
Registered at Amtsgericht Duesseldorf: HRB 46611
- Previous message: [Python-Dev] UCS2/UCS4 default
- Next message: [Python-Dev] UCS2/UCS4 default
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]