[Python-Dev] PEP 528: Change Windows console encoding to UTF-8 (original) (raw)

Paul Moore p.f.moore at gmail.com
Mon Sep 5 05:10:01 EDT 2016


On 5 September 2016 at 06:54, Steve Dower <steve.dower at python.org> wrote:

+Using the raw object with small buffers +--------------------------------------- + +Code that uses the raw IO object and attempts to read less than four characters +will now receive an error. Because it's possible that any single character may +require up to four bytes when represented in utf-8, requests must fail.

I'm very concerned about this statement. It's clearly not true that the request must fail, as reading 1 byte from a UTF-8 enabled Linux console stream currently works (at least I believe it does). And there is code in the wild that works by doing a test that "there's input available" (using kbhit on Windows and select on Unix) and then doing read(1) to ensure a non-blocking read (the pyinvoke code I referenced earlier). If we're going to break this behaviour, I'd argue that we need to provide a working alternative.

At a minimum, can the PEP include a recommended cross-platform means of implementing a non-blocking read from standard input, to replace the current approach? (If the recommendation is to read a larger 4-byte buffer and manage the process of retaining unused bytes yourself, then that's quite a major change to at least the code I'm thinking of in invoke, and I'm not sure read(4) guarantees that it won't block if only 1 byte is available without blocking...)

Paul



More information about the Python-Dev mailing list