[Python-Dev] PEP 528: Change Windows console encoding to UTF-8 (original) (raw)
Steve Dower steve.dower at python.org
Mon Sep 5 15:54:06 EDT 2016
- Previous message (by thread): [Python-Dev] PEP 528: Change Windows console encoding to UTF-8
- Next message (by thread): [Python-Dev] PEP 528: Change Windows console encoding to UTF-8
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On 05Sep2016 1234, eryk sun wrote:
Also, the console is UCS-2, which can't be transcoded between UTF-16 and UTF-8. Supporting UCS-2 in the console would integrate nicely with the filesystem PEP. It makes it always possible to print os.listdir('.'), copy and paste, and read it back without data loss.
Supporting UTF-8 actually works better for this. We already use surrogatepass explicitly (on the filesystem side, with PEP 529) and implicitly (on the console side, using the Windows conversion API).
It would probably be simpler to use UTF-16 in the main pipeline and implement Martin's suggestion to mix in a UTF-8 buffer. The UTF-16 buffer could be renamed as "wbuffer", for expert use. However, if you're fully committed to transcoding in the raw layer, I'm certain that these problems can be addressed with small buffers and using Python's codec machinery for a flexible mix of "surrogatepass" and "replace" error handling.
I don't think it actually makes things simpler. Having two buffers is generally a bad idea unless they are perfectly synced, which would be impossible here without data corruption (if you read half a utf-8 character sequence and then read the wide buffer, do you get that character or not?).
Writing a partial character is easily avoidable by the user. We can either fail with an error or print garbage, and currently printing garbage is the most compatible behaviour. (Also occurs on Linux - I have a VM running this week for testing this stuff.)
Cheers, Steve
- Previous message (by thread): [Python-Dev] PEP 528: Change Windows console encoding to UTF-8
- Next message (by thread): [Python-Dev] PEP 528: Change Windows console encoding to UTF-8
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]