Issue 30410: Documentation for sys.stdout encoding does not reflect the new Windows behavior in Python 3.6+ (original) (raw)

Created on 2017-05-20 08:58 by paul.moore, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 10264 merged lys.nikolaou,2018-10-31 20:01
PR 11860 merged miss-islington,2019-02-14 23:35
Messages (12)
msg294020 - (view) Author: Paul Moore (paul.moore) * (Python committer) Date: 2017-05-20 08:58
The documentation for the encoding of sys.stdin/out/err (see https://docs.python.org/3.6/library/sys.html#sys.stdout) does not reflect the change in Python 3.6 on Windows to use the console Unicode APIs, and hence UTF-8 for the encoding.
msg294046 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2017-05-20 18:37
How about this? The character encoding is platform-dependent. Non-Windows platforms use the locale encoding (see locale.getpreferredencoding()). On Windows, UTF-8 is used for console character devices (i.e. CON, CONIN$, and CONOUT$). However, this can be overridden to use the console as a generic character device by setting the environment variable PYTHONLEGACYWINDOWSSTDIO before starting Python. Non- character devices such as disk files and pipes use the system locale encoding (i.e. the ANSI codepage). Character devices such as NUL (i.e. isatty() returns True) use the value of the console input and output codepages at startup, respectively for stdin and stdout/stderr. This defaults to the system locale encoding if the process is not initially attached to a console. Under all platforms, you can override this value by setting the PYTHONIOENCODING environment variable before starting Python. However, for the Windows console, this only applies when PYTHONLEGACYWINDOWSSTDIO is also set.
msg294061 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2017-05-20 23:59
Looks great, though I wonder whether the rest of the paragraph after "Character devices such as NUL" would be more confusing than it's worth? Can you create a PR? (And having links to the environment variable docs would be great.)
msg294063 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2017-05-21 00:53
I discussed character devices mostly because of the NUL device. It could be surprising that Python dies on an encoding error when output is redirected to NUL: C:\>chcp 1252 Active code page: 1252 C:\>python -c "print('\u20ac')" > nul C:\>chcp 437 Active code page: 437 C:\>python -c "print('\u20ac')" > nul Traceback (most recent call last): File "", line 1, in File "C:\Program Files\Python36\lib\encodings\cp437.py", line 19, in encode return codecs.charmap_encode(input,self.errors,encoding_map)[0] UnicodeEncodeError: 'charmap' codec can't encode character '\u20ac' in position 0: character maps to Unix has a similar problem: $ LANG=C python3 -c 'print("\u20ac")' > /dev/null Traceback (most recent call last): File "", line 1, in UnicodeEncodeError: 'ascii' codec can't encode character '\u20ac' in position 0: ordinal not in range(128) Except /dev/null isn't a TTY. Also, it's rare nowadays for the locale encoding in Unix systems to be something other than UTF-8. It would be useful if we special-cased NUL like we do for the Windows console, but just to make it use the backslashreplace error handler. Unfortunately I don't know how to do that without calling NtQueryObject, for which ObjectNameInformation (1) can't be used because it's undocumented [1]. GetFinalPathNameByHandle also can't be used because it requires file-system devices. As a crude workaround, we could lump together all non-console character devices (i.e. isatty() but not a console). That will affect serial devices, too, but I can't think of a good reason someone would redirect stdout or stderr to a COM port. [1]: https://msdn.microsoft.com/en-us/library/ff550964
msg328766 - (view) Author: Lysandros Nikolaou (lys.nikolaou) * (Python committer) Date: 2018-10-28 22:24
Shall I create a PR for this?
msg328798 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2018-10-29 10:34
Please do!
msg330764 - (view) Author: Lysandros Nikolaou (lys.nikolaou) * (Python committer) Date: 2018-11-30 09:38
Ping.
msg330765 - (view) Author: Paul Moore (paul.moore) * (Python committer) Date: 2018-11-30 09:58
The proposed wording seems a bit over-complex to me. Maybe the following re-wording would be easier to understand? The character encoding is platform-dependent. Non-Windows platforms use the locale encoding (see locale.getpreferredencoding()). On Windows, UTF-8 is used for the console device. Non-character devices such as disk files and pipes use the system locale encoding (i.e. the ANSI codepage). Non-console character devices such as NUL (i.e. where isatty() returns True) use the value of the console input and output codepages at startup, respectively for stdin and stdout/stderr. This defaults to the system locale encoding if the process is not initially attached to a console. The special behaviour of the console can be overridden by setting the environment variable PYTHONLEGACYWINDOWSSTDIO before starting Python. In that case, the console codepages are used as for any other character device. Under all platforms, you can override this value by setting the PYTHONIOENCODING environment variable before starting Python. However, for the Windows console, this only applies when PYTHONLEGACYWINDOWSSTDIO is also set.
msg330901 - (view) Author: Lysandros Nikolaou (lys.nikolaou) * (Python committer) Date: 2018-12-02 22:06
I updated the PR with the new wording by Paul, since I found it easier to understand as well.
msg335573 - (view) Author: miss-islington (miss-islington) Date: 2019-02-14 23:35
New changeset 5723263a3a39a05b6a2f567e0e7771792e6e2f5b by Miss Islington (bot) (Lysandros Nikolaou) in branch 'master': bpo-30410: Documentation of sys.stdin/out/err update to reflect change in 3.6 (GH-10264) https://github.com/python/cpython/commit/5723263a3a39a05b6a2f567e0e7771792e6e2f5b
msg335574 - (view) Author: Mariatta (Mariatta) * (Python committer) Date: 2019-02-14 23:36
Fixed in 3.8 and 3.7. Thanks!
msg335575 - (view) Author: miss-islington (miss-islington) Date: 2019-02-14 23:45
New changeset b8bcec35e01cac018f6ccfc8323d35886340efe0 by Miss Islington (bot) in branch '3.7': bpo-30410: Documentation of sys.stdin/out/err update to reflect change in 3.6 (GH-10264) https://github.com/python/cpython/commit/b8bcec35e01cac018f6ccfc8323d35886340efe0
History
Date User Action Args
2022-04-11 14:58:46 admin set github: 74595
2019-02-14 23:45:23 miss-islington set messages: +
2019-02-14 23:36:54 Mariatta set status: open -> closednosy: + Mariattamessages: + resolution: fixedstage: patch review -> resolved
2019-02-14 23:35:48 miss-islington set pull_requests: + <pull%5Frequest11893>
2019-02-14 23:35:28 miss-islington set nosy: + miss-islingtonmessages: +
2018-12-02 22:06:32 lys.nikolaou set messages: +
2018-11-30 09:58:49 paul.moore set messages: +
2018-11-30 09:38:07 lys.nikolaou set messages: +
2018-10-31 20:01:49 lys.nikolaou set keywords: + patchstage: patch reviewpull_requests: + <pull%5Frequest9575>
2018-10-29 10:34:17 steve.dower set messages: + versions: + Python 3.8
2018-10-28 22:24:59 lys.nikolaou set nosy: + lys.nikolaoumessages: +
2017-05-21 00:53:24 eryksun set messages: +
2017-05-20 23:59:14 steve.dower set messages: +
2017-05-20 18:37:01 eryksun set nosy: + eryksunmessages: +
2017-05-20 08:58:49 paul.moore create