[Python-Dev] len(chr(i)) = 2? (original) (raw)

Victor Stinner victor.stinner at haypocalc.com
Fri Nov 19 21:23:14 CET 2010


Hi,

On Friday 19 November 2010 17:53:58 Alexander Belopolsky wrote:

I was recently surprised to learn that chr(i) can produce a string of length 2 in python 3.x.

Yes, but only on narrow build. Eg. Debian and Ubuntu compile Python 3.1 in wide mode (sys.maxunicode == 1114111).

I suspect that I am not alone finding this behavior non-obvious given that a mistake in Python manual stating the contrary survived several releases. [1]

It was a documentation bug and you fixed it. Non-BMP characters are rare, so few (maybe only you?) noticed the documentation bug. I consider the behaviour as an improvment of non-BMP support of Python3.

Python is unclear about non-BMP characters: narrow build was called "ucs2" for long time, even if it is UTF-16 (each character is encoded to one or two UTF-16 words). Python2 accepts non-BMP characters with \U syntax, but not with chr(). This is inconsistent and I see this as a bug. But I don't want to touch Python2 about non-BMP characters, and the "bug" is already fixed in Python3!

I do believe, however that a change like this [2] and its consequences should be better publicized.

Change made before the release of Python 3.0. Do you want to patch the "What's new in Python 3.0?" document?

I have not found any discussion of this change in PEPs or "What's new" documents. The closest find was a mentioning of a related issue #3280 in the 3.0 NEWS file. [3] Since this feature will be first documented in the Library Reference in 3.2, I wonder if it will be appropriate to mention it in "What's new in 3.2"?

In my opinion, the question is more what was it not fixed in Python2. I suppose that the answer is something ugly like "backward compatibility" or "historical reasons" :-)

Victor



More information about the Python-Dev mailing list