Issue 10546: UTF-16-LE and UTF-16-BE support non-BMP characters (original) (raw)

Created on 2010-11-26 21:08 by vstinner, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
utf_16_bmp.patch vstinner,2010-11-26 21:08
Messages (5)
msg122479 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-11-26 21:08
Python3 doc tells that UTF-16-LE and UTF-16-BE only support BMP characters. What? I think that it is wrong. It was maybe wrong with Python2 and narrow build (unichr() only supports BMP characters), but it is no more true in Python3.
msg123650 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2010-12-08 20:41
Marc or Alexander, can you confirm that the patch is correct?
msg123651 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2010-12-08 21:48
If Victor says so ... Someone needs to check that it works on a UCS4 build, but on a narrow build I don't think UTF-16-XX encodings need to do anything special - they just encode the surrogates as ordinary code units. >>> '\U00010000'.encode('UTF-16-BE').decode('UTF-16-BE') == '\U00010000' True >>> '\U00010000'.encode('UTF-16-LE').decode('UTF-16-LE') == '\U00010000' True
msg123654 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2010-12-08 22:04
I have verified that UTF-16-XX encodings work on wide build. The doc change LGTM. Bonus points for checking that we have unit tests for these encodings that include non-BMP characters.
msg123657 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-12-08 22:26
Fixed by r87135.
History
Date User Action Args
2022-04-11 14:57:09 admin set github: 54755
2010-12-08 22:26:33 vstinner set status: open -> closedresolution: fixedmessages: +
2010-12-08 22:05:50 belopolsky set nosy:lemburg, terry.reedy, cgw, belopolsky, vstinner, docs@pythoncomponents: + Unicode
2010-12-08 22:04:12 belopolsky set messages: +
2010-12-08 21:48:09 belopolsky set messages: +
2010-12-08 20:57:30 terry.reedy set assignee: docs@python
2010-12-08 20:57:07 terry.reedy set assignee: cgw -> (no value)
2010-12-08 20:41:04 terry.reedy set nosy: + terry.reedy, belopolsky, cgw, lemburgmessages: + assignee: docs@python -> cgwstage: commit review
2010-11-26 21:08:30 vstinner create