Message 81061 - Python tracker (original) (raw)

lemburg> This is not possible for unichr() in Python 2.x, since applications lemburg> always expect len(unichr(x)) == 1

Oh, ok.

lemburg> Changing ord() would be possible in Python 2.x is easier, since lemburg> this would only extend the range of returned values for UCS2 lemburg> builds.

ord() of Python3 (narrow build) rejects surrogate characters:

'\U00010000'

len(chr(0x10000)) 2 ord(0x10000) Traceback (most recent call last): File "", line 1, in TypeError: ord() expected string of length 1, but int found

It looks that narrow builds with surrogates have some more problems...

Test with U+10000: "LINEAR B SYLLABLE B008 A", category: Letter, Other.

Correct result (Python 2.5, wide build):

$ python Python 2.5.1 (r251:54863, Jul 31 2008, 23:17:40)

unichr(0x10000) u'\U00010000' unichr(0x10000).isalpha() True

Error in Python3 (narrow build):

marge$ ./python Python 3.1a0 (py3k:69105M, Feb 3 2009, 15:04:35)

chr(0x10000).isalpha() False list(chr(0x10000)) ['\ud800', '\udc00'] chr(0xd800).isalpha() False chr(0xdc00).isalpha() False

Unicode ranges, all in the category "Other, Surrogate":

U+D800..U+DB7F: Non Private Use High Surrogate
U+DB80..U+DBFF: Private Use High Surrogate
U+DC00..U+DFFF: Low Surrogate" range