[Python-Dev] PEP 528: Change Windows console encoding to UTF-8 (original) (raw)

Steve Dower steve.dower at python.org
Mon Sep 5 13:38:01 EDT 2016

Previous message (by thread): [Python-Dev] PEP 528: Change Windows console encoding to UTF-8
Next message (by thread): [Python-Dev] PEP 528: Change Windows console encoding to UTF-8
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 05Sep2016 0941, Paul Moore wrote:

On 5 September 2016 at 14:36, Steve Dower <steve.dower at python.org> wrote:

The best fix is to use a buffered reader, which will read all the available bytes and then let you .read(1), even if it happens to be an incomplete character. But this is sys.stdin.buffer.raw, we're talking about. People can't really layer anything on top of that, it's precisely because they are trying to bypass the existing layering (that doesn't work the way that they need it to, because it blocks) that is the problem here.

This layer also blocks, and always has. You need to go to platform specific functions anyway to get non-blocking functionality (which is also wrapped up in getc I believe, but that isn't used by FileIO or the new WinConsoleIO classes).

We could theoretically add buffering to the raw reader to handle one character, which would allow very small reads from raw, but that severely complicates things and the advice to use a buffered reader is good advice anyway. Can you provide an example of how I'd rewrite the code that I quoted previously to follow this advice? Note - this is not theoretical, I expect to have to provide a PR to fix exactly this code should this change go in. At the moment I can't find a way that doesn't impact the (currently working and not expected to need any change) Unix version of the code, most likely I'll have to add buffering of 4-byte reads (which as you say is complex).

The easiest way to follow it is to use "sys.stdin.buffer.read(1)" rather than "sys.stdin.buffer.raw.read(1)".

PS I'm not 100% sure that under POSIX read() will return partial UTF-8 byte sequences. I think it must, because otherwise a lot of code I've seen would be broken, but if a POSIX expert can confirm or deny my assumption, that would be great.

I just tested, and yes it returns partial characters. That's a good reason to do the single character buffering ourselves. Shouldn't be too hard to deal with.

Cheers, Steve

Previous message (by thread): [Python-Dev] PEP 528: Change Windows console encoding to UTF-8
Next message (by thread): [Python-Dev] PEP 528: Change Windows console encoding to UTF-8
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the Python-Dev mailing list