[Python-Dev] PEP 528: Change Windows console encoding to UTF-8 (original) (raw)

Paul Moore p.f.moore at gmail.com
Mon Sep 5 16:19:46 EDT 2016

Previous message (by thread): [Python-Dev] PEP 528: Change Windows console encoding to UTF-8
Next message (by thread): [Python-Dev] PEP 528: Change Windows console encoding to UTF-8
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 5 September 2016 at 20:34, eryk sun <eryksun at gmail.com> wrote:

Paul, do you have example code that uses the 'raw' stream? Using the buffer should behave as it always has -- at least in this regard. sys.stdin.buffer requests a large block, such as 8 KB. But since the console defaults to a cooked mode (i.e. processed input and line input -- control keys, command-line editing, input history, and aliases), ReadConsole returns when enter is pressed or when interrupted. It returns at least '\r\n', unless interrupted by Ctrl+C, Ctrl+Break or a custom CtrlWakeup key. However, if line-input mode is disabled, ReadConsole returns as soon as one or more characters is available in the input buffer.

The code I'm looking at doesn't use the raw stream (I think). The problem I had (and the reason I was concerned) is that the code does some rather messy things, and without tracing back through the full code path, I'm not 100% sure what level of stream it's using. However, now that I know that the buffered layer won't ever error because 1 byte isn't enough to return a full character, if I need to change the code I can do so by switching to the buffered layer and fixing the issue that way (although with Steve's new proposal even that won't be necessary).

As to kbhit() returning true, this does not mean that read(1) from console input won't block (not unless line-input mode is disabled). It does mean that getwch() won't block (note the "w" in there; this one reads Unicode characters).The CRT's conio functions (e.g. kbhit, getwch) put the console input buffer in a raw mode (e.g. ^C is read as '\x03' instead of generating a CTRLCEVENT) and call the lower-level functions PeekConsoleInputW (kbhit) and ReadConsoleInputW (getwch), to peek at and read input event records.

I understand. The code I'm working on was originally written for pure POSIX, with all the termios calls to set the console into unbuffered mode. In addition, it was until recently using the Python 2 text model, and so there's a lot of places in the code where it's still confused about whether it's processing bytes or characters (we've got rid of a lot of "let's decode and see if that helps" calls...). At the moment, kbhit(), while not correct, is "good enough". When I get the time, and we get to a point where it's enough of a priority, I may well look at refactoring this stuff to use proper Windows calls via ctypes to do "read what's available". But that's a way off yet.

Thanks for the information, though, I'll keep it in mind when we do get to a point where we're looking at this. Paul

Previous message (by thread): [Python-Dev] PEP 528: Change Windows console encoding to UTF-8
Next message (by thread): [Python-Dev] PEP 528: Change Windows console encoding to UTF-8
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the Python-Dev mailing list