[Python-Dev] UCS2/UCS4 default (original) (raw)

Guido van Rossum guido at python.org
Wed Jul 2 19:42:13 CEST 2008


On Wed, Jul 2, 2008 at 10:19 AM, Jeroen Ruigrok van der Werven <asmodai at in-nomine.org> wrote:

-On [20080702 19:08], Guido van Rossum (guido at python.org) wrote:

I think we should continue to leave this up to the distribution. AFAIK many Linux distros already use UCS4 for everything anyway. FreeBSD's ports makes it a configure option.

For that reason I think it's also better that the configure script continues to default to UTF-16 -- this will give the UTF-16 support code the necessary exercise. (It is mostly a superset of the UCS-4 support code, so I'm less worried about the latter getting enough exercise.) I was under the impression that it was still UCS2 and thus limiting things to the BMP only. So you are saying it's UTF-16 nowadays? For both 2.6 and 3.0?

Yes. At least in the sense that \Uxxxxxxxx gets translated to a surrogate pair, and that the UTF-8 codec supports surrogate pairs in both directions. It's been like this for a long time. What else would you expect from UTF-16 support?

-- --Guido van Rossum (home page: http://www.python.org/~guido/)



More information about the Python-Dev mailing list