Python3 doc tells that UTF-16-LE and UTF-16-BE only support BMP characters. What? I think that it is wrong. It was maybe wrong with Python2 and narrow build (unichr() only supports BMP characters), but it is no more true in Python3.
If Victor says so ... Someone needs to check that it works on a UCS4 build, but on a narrow build I don't think UTF-16-XX encodings need to do anything special - they just encode the surrogates as ordinary code units. >>> '\U00010000'.encode('UTF-16-BE').decode('UTF-16-BE') == '\U00010000' True >>> '\U00010000'.encode('UTF-16-LE').decode('UTF-16-LE') == '\U00010000' True
I have verified that UTF-16-XX encodings work on wide build. The doc change LGTM. Bonus points for checking that we have unit tests for these encodings that include non-BMP characters.