[Python-Dev] Support of UTF-16 and UTF-32 source encodings (original) (raw)

eryksun eryksun at gmail.com
Sat Nov 14 21:57:51 EST 2015


On Sat, Nov 14, 2015 at 7:06 PM, Steve Dower <steve.dower at python.org> wrote:

The native encoding on Windows has been UTF-16 since Windows NT. Obviously we've survived without Python tokenization support for a long time, but every API uses it.

Windows 2000 was the first version to have broad support for UTF-16. Windows NT (1993) was released before UTF-16, so its Unicode support is limited to UCS-2.

(Note that console windows still restrict each character cell to a single WCHAR character. So a non-BMP character encoded as a UTF-16 surrogate pair always appears as two box glyphs. Of course you can copy and paste from the console to a UTF-16 aware window just fine.)

I've hit a few cases where it would have been handy for Python to be able to detect it, though nothing I couldn't work around.

Can you elaborate some example cases? I can see using UTF-16 for the REPL in the Windows console, but a hypothetical WinConIO class could simply transcode to and from UTF-8. Drekin's win-unicode-console package works like this.



More information about the Python-Dev mailing list