[Python-Dev] PEP 528: Change Windows console encoding to UTF-8 (original) (raw)

Martin Panter vadmium+py at gmail.com
Mon Sep 5 05:37:36 EDT 2016


On 5 September 2016 at 09:10, Paul Moore <p.f.moore at gmail.com> wrote:

On 5 September 2016 at 06:54, Steve Dower <steve.dower at python.org> wrote:

+Using the raw object with small buffers +--------------------------------------- + +Code that uses the raw IO object and attempts to read less than four characters +will now receive an error. Because it's possible that any single character may +require up to four bytes when represented in utf-8, requests must fail. I'm very concerned about this statement. It's clearly not true that the request must fail, as reading 1 byte from a UTF-8 enabled Linux console stream currently works (at least I believe it does). And there is code in the wild that works by doing a test that "there's input available" (using kbhit on Windows and select on Unix) and then doing read(1) to ensure a non-blocking read (the pyinvoke code I referenced earlier). If we're going to break this behaviour, I'd argue that we need to provide a working alternative. At a minimum, can the PEP include a recommended cross-platform means of implementing a non-blocking read from standard input, to replace the current approach? (If the recommendation is to read a larger 4-byte buffer and manage the process of retaining unused bytes yourself, then that's quite a major change to at least the code I'm thinking of in invoke, and I'm not sure read(4) guarantees that it won't block if only 1 byte is available without blocking...)

FWIW, on Linux and Unix in general, if select() or similar indicates that some read data is available, calling raw read() with any buffer size should return at least one byte, whatever is available, without blocking. If the user has only typed one byte, read(4) would return that one byte immediately.

But if you are using a BufferedReader (stdin.buffer rather than stdin.buffer.raw), then this guarantee is off and read(4) will block until it gets 4 bytes, or until EOF.



More information about the Python-Dev mailing list