msg294020 - (view) |
Author: Paul Moore (paul.moore) *  |
Date: 2017-05-20 08:58 |
The documentation for the encoding of sys.stdin/out/err (see https://docs.python.org/3.6/library/sys.html#sys.stdout) does not reflect the change in Python 3.6 on Windows to use the console Unicode APIs, and hence UTF-8 for the encoding. |
|
|
msg294046 - (view) |
Author: Eryk Sun (eryksun) *  |
Date: 2017-05-20 18:37 |
How about this? The character encoding is platform-dependent. Non-Windows platforms use the locale encoding (see locale.getpreferredencoding()). On Windows, UTF-8 is used for console character devices (i.e. CON, CONIN$, and CONOUT$). However, this can be overridden to use the console as a generic character device by setting the environment variable PYTHONLEGACYWINDOWSSTDIO before starting Python. Non- character devices such as disk files and pipes use the system locale encoding (i.e. the ANSI codepage). Character devices such as NUL (i.e. isatty() returns True) use the value of the console input and output codepages at startup, respectively for stdin and stdout/stderr. This defaults to the system locale encoding if the process is not initially attached to a console. Under all platforms, you can override this value by setting the PYTHONIOENCODING environment variable before starting Python. However, for the Windows console, this only applies when PYTHONLEGACYWINDOWSSTDIO is also set. |
|
|
msg294061 - (view) |
Author: Steve Dower (steve.dower) *  |
Date: 2017-05-20 23:59 |
Looks great, though I wonder whether the rest of the paragraph after "Character devices such as NUL" would be more confusing than it's worth? Can you create a PR? (And having links to the environment variable docs would be great.) |
|
|
msg294063 - (view) |
Author: Eryk Sun (eryksun) *  |
Date: 2017-05-21 00:53 |
I discussed character devices mostly because of the NUL device. It could be surprising that Python dies on an encoding error when output is redirected to NUL: C:\>chcp 1252 Active code page: 1252 C:\>python -c "print('\u20ac')" > nul C:\>chcp 437 Active code page: 437 C:\>python -c "print('\u20ac')" > nul Traceback (most recent call last): File "", line 1, in File "C:\Program Files\Python36\lib\encodings\cp437.py", line 19, in encode return codecs.charmap_encode(input,self.errors,encoding_map)[0] UnicodeEncodeError: 'charmap' codec can't encode character '\u20ac' in position 0: character maps to Unix has a similar problem: $ LANG=C python3 -c 'print("\u20ac")' > /dev/null Traceback (most recent call last): File "", line 1, in UnicodeEncodeError: 'ascii' codec can't encode character '\u20ac' in position 0: ordinal not in range(128) Except /dev/null isn't a TTY. Also, it's rare nowadays for the locale encoding in Unix systems to be something other than UTF-8. It would be useful if we special-cased NUL like we do for the Windows console, but just to make it use the backslashreplace error handler. Unfortunately I don't know how to do that without calling NtQueryObject, for which ObjectNameInformation (1) can't be used because it's undocumented [1]. GetFinalPathNameByHandle also can't be used because it requires file-system devices. As a crude workaround, we could lump together all non-console character devices (i.e. isatty() but not a console). That will affect serial devices, too, but I can't think of a good reason someone would redirect stdout or stderr to a COM port. [1]: https://msdn.microsoft.com/en-us/library/ff550964 |
|
|
msg328766 - (view) |
Author: Lysandros Nikolaou (lys.nikolaou) *  |
Date: 2018-10-28 22:24 |
Shall I create a PR for this? |
|
|
msg328798 - (view) |
Author: Steve Dower (steve.dower) *  |
Date: 2018-10-29 10:34 |
Please do! |
|
|
msg330764 - (view) |
Author: Lysandros Nikolaou (lys.nikolaou) *  |
Date: 2018-11-30 09:38 |
Ping. |
|
|
msg330765 - (view) |
Author: Paul Moore (paul.moore) *  |
Date: 2018-11-30 09:58 |
The proposed wording seems a bit over-complex to me. Maybe the following re-wording would be easier to understand? The character encoding is platform-dependent. Non-Windows platforms use the locale encoding (see locale.getpreferredencoding()). On Windows, UTF-8 is used for the console device. Non-character devices such as disk files and pipes use the system locale encoding (i.e. the ANSI codepage). Non-console character devices such as NUL (i.e. where isatty() returns True) use the value of the console input and output codepages at startup, respectively for stdin and stdout/stderr. This defaults to the system locale encoding if the process is not initially attached to a console. The special behaviour of the console can be overridden by setting the environment variable PYTHONLEGACYWINDOWSSTDIO before starting Python. In that case, the console codepages are used as for any other character device. Under all platforms, you can override this value by setting the PYTHONIOENCODING environment variable before starting Python. However, for the Windows console, this only applies when PYTHONLEGACYWINDOWSSTDIO is also set. |
|
|
msg330901 - (view) |
Author: Lysandros Nikolaou (lys.nikolaou) *  |
Date: 2018-12-02 22:06 |
I updated the PR with the new wording by Paul, since I found it easier to understand as well. |
|
|
msg335573 - (view) |
Author: miss-islington (miss-islington) |
Date: 2019-02-14 23:35 |
New changeset 5723263a3a39a05b6a2f567e0e7771792e6e2f5b by Miss Islington (bot) (Lysandros Nikolaou) in branch 'master': bpo-30410: Documentation of sys.stdin/out/err update to reflect change in 3.6 (GH-10264) https://github.com/python/cpython/commit/5723263a3a39a05b6a2f567e0e7771792e6e2f5b |
|
|
msg335574 - (view) |
Author: Mariatta (Mariatta) *  |
Date: 2019-02-14 23:36 |
Fixed in 3.8 and 3.7. Thanks! |
|
|
msg335575 - (view) |
Author: miss-islington (miss-islington) |
Date: 2019-02-14 23:45 |
New changeset b8bcec35e01cac018f6ccfc8323d35886340efe0 by Miss Islington (bot) in branch '3.7': bpo-30410: Documentation of sys.stdin/out/err update to reflect change in 3.6 (GH-10264) https://github.com/python/cpython/commit/b8bcec35e01cac018f6ccfc8323d35886340efe0 |
|
|