[Python-Dev] UCS2/UCS4 default (original) (raw)
Guido van Rossum guido at python.org
Wed Jul 2 20:27:42 CEST 2008
- Previous message: [Python-Dev] UCS2/UCS4 default
- Next message: [Python-Dev] UCS2/UCS4 default
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Wed, Jul 2, 2008 at 11:22 AM, Jeroen Ruigrok van der Werven <asmodai at in-nomine.org> wrote:
-On [20080702 19:42], Guido van Rossum (guido at python.org) wrote:
Yes. At least in the sense that \Uxxxxxxxx gets translated to a surrogate pair, and that the UTF-8 codec supports surrogate pairs in both directions. It's been like this for a long time. What else would you expect from UTF-16 support? Well, unless I misunderstand things, a Python 3 compiled with the default Unicode option gives this:
len("\N{MUSICAL SYMBOL G CLEF}") 2 Whereas a Python 3 with --with-wide-unicode gives:
len("\N{MUSICAL SYMBOL G CLEF}") 1 This, of course, causes problems with splitting, finding, and so on.
Understood.
So that means that a Python 3 with only 2 byte Unicode support is not to be used/recommended for Unicode outside of the BMP.
I disagree. Instead, I would say that such code needs to be aware of surrogates.
-- --Guido van Rossum (home page: http://www.python.org/~guido/)
- Previous message: [Python-Dev] UCS2/UCS4 default
- Next message: [Python-Dev] UCS2/UCS4 default
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]