Issue 28333: input() with Unicode prompt produces mojibake on Windows (original) (raw)

process

Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: steve.dower Nosy List: Drekin, eryksun, ezio.melotti, ned.deily, paul.moore, python-dev, steve.dower, terry.reedy, tim.golden, vstinner, zach.ware
Priority: normal Keywords: 3.5regression, patch

Created on 2016-10-01 15:20 by Drekin, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
issue_28333_01.patch eryksun,2016-10-08 01:01 review
Pull Requests
URL Status Linked Edit
PR 552 closed dstufft,2017-03-31 16:36
Messages (13)
msg277821 - (view) Author: Adam Bartoš (Drekin) * Date: 2016-10-01 15:20
In my setting (Python 3.6b1 on Windows), trying to prompt a non-ASCII character via input() results in mojibake. This is related to the recent fix of #1602 and so is Windows-specific. >>> input("α") ╬▒ The result corresponds to print("α".encode("utf-8").decode("cp852")). That cp852 the default terminal encoding in my locale.
msg278274 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2016-10-07 21:51
Same output with cp437.
msg278275 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2016-10-07 21:52
This is a regression from 3.5.2, where input("α") displays "α".
msg278277 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2016-10-07 22:08
This may force into 3.6 - we really ought to be getting and using sys.stdin and sys.stderr in PyOS_StdioReadline() rather than going directly to the raw streams. The problem here is that we're still using fprintf to output the prompt, even though we know (assume) the input is utf-8. I haven't looked closely at how safely we can use Python objects from this code, except to see that it's not obviously safe, but we should really figure out how to deal in Python str rather than C char* for the default readline implementation (and then only fall back on the GNU protocol when someone asks for it). The faster fix here would be to decode the prompt from utf-8 to utf-16-le in PyOS_StdioReadline and then write it using a wide-char output function.
msg278281 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2016-10-07 22:32
When I pointed this issue out in code reviews, I assumed you would add the relatively simple fix to decode the prompt and call WriteConsoleW. The long-term fix in issue 17620 has to be worked out with cross-platform support, and ISTM that it can wait for 3.7. Off topic: I just noticed that you're not calling PyOS_InputHook in the new PyOS_StdioReadline code. Tkinter registers this function pointer to call its EventHook. Do you want a separate issue for this, or is there a reason its was omitted?
msg278284 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2016-10-08 01:01
I'm sure Steve already has this covered, but FWIW here's a patch to call WriteConsoleW. Here's the result with the patch applied: >>> sys.ps1 = '»»» ' »»» input("αβψδ: ") αβψδ: spam 'spam' and with interactive stdin and stdout/stderr redirected to a file: >set PYTHONIOENCODING=utf-8 >amd64\python_d.exe >out.txt 2>&1 input("αβψδ: ") spam ^Z >chcp 65001 Active code page: 65001 >type out.txt Python 3.6.0b1+ (default, Oct 7 2016, 23:47:58) [MSC v.1900 64 bit (AMD64)] on win32 Type "help", "copyright", "credits" or "license" for more information. >>> αβψδ: 'spam' >>> If it can't write the prompt for some reason (e.g. out of memory, decoding fails, WriteConsole fails), it doesn't fall back on fprintf to write the prompt. Should it? This should also get a test that calls ReadConsoleOutputCharacter to verify that the correct prompt is written.
msg278317 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2016-10-08 19:19
New changeset faf5493e6f61 by Steve Dower in branch '3.6': Issue #28333: Enables Unicode for ps1/ps2 and input() prompts. (Patch by Eryk Sun) https://hg.python.org/cpython/rev/faf5493e6f61 New changeset cb62e921bd06 by Steve Dower in branch 'default': Issue #28333: Enables Unicode for ps1/ps2 and input() prompts. (Patch by Eryk Sun) https://hg.python.org/cpython/rev/cb62e921bd06
msg278318 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2016-10-08 19:21
New changeset 63ceadf8410f by Steve Dower in branch '3.6': Issue #28333: Remove unnecessary increment. https://hg.python.org/cpython/rev/63ceadf8410f New changeset d76c8f9ea787 by Steve Dower in branch 'default': Issue #28333: Remove unnecessary increment. https://hg.python.org/cpython/rev/d76c8f9ea787
msg278319 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2016-10-08 19:23
I made some minor tweaks to the patch (no need for strlen() - passing -1 works equivalently), but otherwise it's exactly what I would have done so I committed it. We currently have no tests to check which characters are written to a console output buffer. Issue28217 was tracking those, but considering how little code we have on top of output I don't think it's worth blocking anything on automating those tests.
msg278624 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2016-10-13 23:04
MultibyteToWideChar includes the trailing NUL when it gets the string length, so the WriteConsoleW call needs to use (wlen - 1).
msg279427 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2016-10-25 17:50
Not sure how I missed it originally, but that extra 1 char is actually very important: Python 3.6.0b2 (v3.6.0b2:b9fadc7d1c3f, Oct 10 2016, 20:36:51) [MSC v.1900 32 bit (Intel)] on win32 Type "help", "copyright", "credits" or "license" for more information. >>> import sys >>> sys.ps1='> ' > sys The extra space is because of that. Really ought to fix this before the next beta.
msg279435 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2016-10-25 18:08
I forgot to include the link to the python-list thread where this came up: https://mail.python.org/pipermail/python-list/2016-October/715428.html
msg279445 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2016-10-25 18:52
New changeset 6b46c3deea2c by Steve Dower in branch '3.6': Issue #28333: Fixes off-by-one error that was adding an extra space. https://hg.python.org/cpython/rev/6b46c3deea2c New changeset 44d15ba67d2e by Steve Dower in branch 'default': Issue #28333: Fixes off-by-one error that was adding an extra space. https://hg.python.org/cpython/rev/44d15ba67d2e
History
Date User Action Args
2022-04-11 14:58:37 admin set github: 72520
2017-03-31 16:36:19 dstufft set pull_requests: + <pull%5Frequest928>
2016-10-25 18:52:52 steve.dower set status: open -> closedresolution: fixedstage: needs patch -> resolved
2016-10-25 18:52:32 python-dev set messages: +
2016-10-25 18:08:15 eryksun set messages: +
2016-10-25 17:50:08 steve.dower set assignee: steve.dowermessages: + nosy: + ned.deily
2016-10-13 23:04:07 eryksun set messages: +
2016-10-08 19:23:23 steve.dower set messages: +
2016-10-08 19:21:35 python-dev set messages: +
2016-10-08 19:19:01 python-dev set nosy: + python-devmessages: +
2016-10-08 01:01:17 eryksun set files: + issue_28333_01.patchkeywords: + patchmessages: +
2016-10-07 22:32:54 eryksun set nosy: + eryksunmessages: +
2016-10-07 22:08:01 steve.dower set keywords: + 3.5regressionstage: needs patchmessages: + versions: + Python 3.7
2016-10-07 21:52:48 terry.reedy set messages: +
2016-10-07 21:51:43 terry.reedy set nosy: + terry.reedymessages: +
2016-10-01 15:20:25 Drekin create