Message 81052 - Python tracker (original) (raw)

amaury> Since r56395, ord() and chr() accept and return surrogate pairs amaury> even in narrow builds.

Note: My examples are made with Python 2.x.

The goal is to remove most differences between narrow and wide unicode builds (except for string lengths, indices or slices)

It would be nice to get the same behaviour in Python 2.x and 3.x to help migration from Python2 to Python3 ;-)

unichr() (in Python 2.x) documentation is correct. But I would approciate to support surrogates using unichr() which means also changing ord() behaviour.

To address this problem, I suggest to change all functions in unicodectype.c so that they accept Py_UCS4 characters (instead of Py_UNICODE).

Why? Using surrogates, you can use 16-bits Py_UNICODE to store non-BMP characters (code > 0xffff).

I can open a new issue if you agree that we can change unichr() / ord() behaviour on narrow build. We may ask on the mailing list?