Issue 1688: Incorrectly displayed non ascii characters in prompt using "input()" - Python 3.0a2 (original) (raw)

Issue1688

Created on 2007-12-22 21:09 by vbr, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
inputprompt.patch amaury.forgeotdarc,2008-09-20 20:20
Messages (16)
msg58965 - (view) Author: Vlastimil Brom (vbr) Date: 2007-12-22 21:09
While testing the 3.0a2 build (on Win XPh SP2, Czech), I found a possible bug in the input() function; if the prompt text contains non-ascii characters (even those present in the default charset of the system locale - Czech in this case) the prompt is displayed incorrectly; however, the inserted value is treated as expected. The print() function deals with these characters correctly. This bug occurs in the system console (cmd.exe) only, using idle everything works ok. ============ a minimal snapshot of the session follows ========== Python 3.0a2 (r30a2:59397:59399, Dec 6 2007, 22:34:52) [MSC v.1500 32 bit (Inte l)] on win32 Type "help", "copyright", "credits" or "license" for more information. >>> input("ěšč: ") ─Ť┼í─Ź: 7 '7' >>> print("ěšč: ") ěšč: >>> ================================== Python 2.5.1 (r251:54863, Apr 18 2007, 08:51:08) [MSC v.1310 32 bit (Intel)] on win32 Type "help", "copyright", "credits" or "license" for more information. >>> input(u"ěšč: ") Traceback (most recent call last): File "", line 1, in UnicodeEncodeError: 'ascii' codec can't encode characters in position 0- 2: ordin al not in range(128) >>> print u"ěšč: " ěšč: >>>
msg58969 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2007-12-23 10:11
Would you like to work on a patch?
msg59039 - (view) Author: Vlastimil Brom (vbr) Date: 2007-12-29 19:53
First sorry about a delayed response, but moreover, I fear, preparing a patch would be far beyond my programming competence; sorry about that.
msg59125 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2008-01-03 06:14
I think I understand what's going on. The trail leads from the last "if (tty) {" block in builtin_input() to PyOS_Readline() which in turn ends up calling PyOS_StdioReadline() (because that's the most likely initialization of PyOS_ReadlineFunctionPointer). And this, finally, uses fprintf() to stderr to print the prompt. That apparently doesn't use the same encoding, or perhaps by now the string has been encoded as UTF-8. This is clearly a problem. But what to do about it...
msg59140 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2008-01-03 18:19
Windows needs its own PyOS_StdioReadline() function in order to support wide chars. We can either use the low level functions _putwch() and _getwche(). Or we could probably use the more higher functions _cwprintf_s() (secure console wide char print format, oh I love MS' naming schema) and _cgetws_s().
msg59141 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2008-01-03 18:40
Cool. I suspect Unix will also require a customized version to be used in case GNU readline isn't present. And I wouldn't be surprised if GNU readline itself doesn't handle UTF-8 properly either!
msg59142 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2008-01-03 18:51
Guido van Rossum wrote: > I suspect Unix will also require a customized version to be used in case > GNU readline isn't present. > > And I wouldn't be surprised if GNU readline itself doesn't handle UTF-8 > properly either! GNU readline can handle UTF-8 chars fine on my system: äßé: ä ä My locales are set to de_DE.UTF-8 Christian
msg59144 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2008-01-03 19:18
If possible, I would like to see the C library phased out of Python on Windows, for file I/O. In this case, it would mean that ReadConsoleW is used directly for character input. Notice that _cgetws does not take a file handle as a parameter, but implicitly uses _coninpfh. As a consequence, PyOS_StdioReadline probably should change its parameter from FILE* to "file handle", and consequently rename it to, say, PyOS_Readline.
msg59685 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2008-01-11 00:19
Isn't it enough to encode the prompt with the console encoding, instead of letting the default utf-8 conversion? This patch corrects the issue on Windows: Index: ../Python/bltinmodule.c =================================================================== --- ../Python/bltinmodule.c (revision 59843) +++ ../Python/bltinmodule.c (working copy) @@ -1358,12 +1358,19 @@ else Py_DECREF(tmp); if (promptarg != NULL) { - po = PyObject_Str(promptarg); + PyObject *stringpo = PyObject_Str(promptarg); + if (stringpo == NULL) { + Py_DECREF(stdin_encoding); + return NULL; + } + po = PyUnicode_AsEncodedString(stringpo, + PyUnicode_AsString(stdin_encoding), NULL); + Py_DECREF(stringpo); if (po == NULL) { Py_DECREF(stdin_encoding); return NULL; } - prompt = PyUnicode_AsString(po); + prompt = PyString_AsString(po); if (prompt == NULL) { Py_DECREF(stdin_encoding); Py_DECREF(po);
msg59695 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2008-01-11 08:36
> Isn't it enough to encode the prompt with the console encoding, instead > of letting the default utf-8 conversion? This patch corrects the issue > on Windows: Sounds right. Technically, you should be using the stdout encoding, but I don't think it should ever differ from the stdin_encoding.
msg73458 - (view) Author: Vlastimil Brom (vbr) Date: 2008-09-20 07:38
While I am not sure about the status of this somewhat older issue, I just wanted to mention, that the behaviour remains the same in Python 3.0rc1 (XPh SP3, Czech) Python 3.0rc1 (r30rc1:66507, Sep 18 2008, 14:47:08) [MSC v.1500 32 bit (Intel)] on win32 Type "help", "copyright", "credits" or "license" for more information. >>> input("ěšč: ") ─Ť┼í─Ź: řžý 'řžý' >>> print("ěšč: ") ěšč: >>> Is the patch above supposed to have been committed, or are there yet another difficulties? (Not that it is a huge problem (for me), as applications dealing with non ascii text probably would use a gui, rather than relying on a console, but it's a kind of surprising.)
msg73462 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2008-09-20 10:46
Amaury, what further review of the patch do you desire? I had already commented that I consider the patch correct, except that it might use stdout_encoding instead. Also, I wouldn't consider this a release blocker. It is somewhat annoying that input produces moji-bake in certain cases (i.e. non-ASCII characters in the prompt, and a non-UTF-8 terminal), but if the patch wouldn't make it into 3.0, we can still fix it in 3.0.1.
msg73464 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2008-09-20 15:04
Given MvL's review, assuming it fixes the Czech problem, I'm all for applying it.
msg73471 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2008-09-20 20:20
Here is a new version of the patch: the PyString* functions were renamed to PyBytes*, and it now uses stdout_encoding. About the "release blocker" status: I agree it is not so important, I just wanted to express my "it's been here for long, it's almost ready, it would be a pity not to have it in the final 3.0" feelings.
msg73527 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2008-09-21 20:32
I'm ok with this patch.
msg73536 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2008-09-21 22:11
Committed r66545.
History
Date User Action Args
2022-04-11 14:56:29 admin set github: 46029
2008-09-21 22:11:31 amaury.forgeotdarc set status: open -> closedresolution: fixedmessages: +
2008-09-21 20:32:00 benjamin.peterson set keywords: - needs reviewnosy: + benjamin.petersonmessages: +
2008-09-20 20:20:22 amaury.forgeotdarc set files: + inputprompt.patchkeywords: + patchmessages: +
2008-09-20 15:04:33 gvanrossum set messages: +
2008-09-20 10:46:40 loewis set messages: +
2008-09-20 08:59:17 amaury.forgeotdarc set priority: normal -> release blockerkeywords: + needs review
2008-09-20 07:38:10 vbr set messages: +
2008-01-11 08:36:32 loewis set messages: +
2008-01-11 00:19:28 amaury.forgeotdarc set messages: +
2008-01-06 22:29:44 admin set keywords: - py3kversions: Python 3.0
2008-01-03 19🔞31 loewis set messages: +
2008-01-03 18:51:15 christian.heimes set messages: +
2008-01-03 18:40:34 gvanrossum set messages: +
2008-01-03 18:19:44 christian.heimes set messages: +
2008-01-03 06:15:00 gvanrossum set priority: normalnosy: + gvanrossum, christian.heimesmessages: + keywords: + py3k
2007-12-30 22:24:01 amaury.forgeotdarc set nosy: + amaury.forgeotdarc
2007-12-29 19:53:16 vbr set messages: +
2007-12-23 10:11:16 loewis set nosy: + loewismessages: +
2007-12-22 21:09:32 vbr create