Issue 10546: UTF-16-LE and UTF-16-BE support non-BMP characters (original) (raw)

Created on 2010-11-26 21:08 by vstinner, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
utf_16_bmp.patch	vstinner,2010-11-26 21:08

Messages (5)
msg122479 - (view)	Author: STINNER Victor (vstinner) *	Date: 2010-11-26 21:08
Python3 doc tells that UTF-16-LE and UTF-16-BE only support BMP characters. What? I think that it is wrong. It was maybe wrong with Python2 and narrow build (unichr() only supports BMP characters), but it is no more true in Python3.
msg123650 - (view)	Author: Terry J. Reedy (terry.reedy) *	Date: 2010-12-08 20:41
Marc or Alexander, can you confirm that the patch is correct?
msg123651 - (view)	Author: Alexander Belopolsky (belopolsky) *	Date: 2010-12-08 21:48
If Victor says so ... Someone needs to check that it works on a UCS4 build, but on a narrow build I don't think UTF-16-XX encodings need to do anything special - they just encode the surrogates as ordinary code units. >>> '\U00010000'.encode('UTF-16-BE').decode('UTF-16-BE') == '\U00010000' True >>> '\U00010000'.encode('UTF-16-LE').decode('UTF-16-LE') == '\U00010000' True
msg123654 - (view)	Author: Alexander Belopolsky (belopolsky) *	Date: 2010-12-08 22:04
I have verified that UTF-16-XX encodings work on wide build. The doc change LGTM. Bonus points for checking that we have unit tests for these encodings that include non-BMP characters.
msg123657 - (view)	Author: STINNER Victor (vstinner) *	Date: 2010-12-08 22:26
Fixed by r87135.

History
Date	User	Action	Args
2022-04-11 14:57:09	admin	set	github: 54755
2010-12-08 22:26:33	vstinner	set	status: open -> closedresolution: fixedmessages: +
2010-12-08 22:05:50	belopolsky	set	nosy:lemburg, terry.reedy, cgw, belopolsky, vstinner, docs@pythoncomponents: + Unicode
2010-12-08 22:04:12	belopolsky	set	messages: +
2010-12-08 21:48:09	belopolsky	set	messages: +
2010-12-08 20:57:30	terry.reedy	set	assignee: docs@python
2010-12-08 20:57:07	terry.reedy	set	assignee: cgw -> (no value)
2010-12-08 20:41:04	terry.reedy	set	nosy: + terry.reedy, belopolsky, cgw, lemburgmessages: + assignee: docs@python -> cgwstage: commit review
2010-11-26 21:08:30	vstinner	create