Issue 17620: Python interactive console doesn't use sys.stdin for input (original) (raw)

Created on 2013-04-02 16:59 by Drekin, last changed 2022-04-11 14:57 by admin.

Messages (29)

msg185848 - (view)

Author: Adam Bartoš (Drekin) *

Date: 2013-04-02 16:59

The Python interactive console actually doesn't use sys.stdin but standard C stdin for input. Is there any reason for this? Why it then uses its encoding attribute? (Assigning sys.stdin something, that doesn't have encoding attribute freezes the interpreter.) If anything, wouldn't it make more sense if it used sys.stdin.encoding instead of sys.stdin? sys.stdin is intended to be set by user (it affects input() and code.inpterrupt() which tries to minic standard interactive console).

msg186121 - (view)

Author: Adam Bartoš (Drekin) *

Date: 2013-04-06 09:41

Sorry for typos. • interactive console doesn't use sys.stdin for input, why? • it uses sys.stdin.encoding, shouldn't it rather use sys.stdin.encoding if anything? • input() and hence code.interact() uses sys.stdin

msg186553 - (view)

Author: Antoine Pitrou (pitrou) * (Python committer)

Date: 2013-04-11 09:56

• interactive console doesn't use sys.stdin for input, why?

Modules/main.c calls PyRun_AnyFileFlags(stdin, "", ...). At this point, sys.stdin is the same as C stdin by construction, so I'm not sure how you came to encounter the issue.

However, it's also true that if you later redirect sys.stdin, it will be ignored and the original C stdin (as passed to PyRun_InteractiveLoopFlags) will continue to be used. On the other hand, the input() implementation has dedicated logic to find out whether sys.stdin is the same as C stdin.

(by the way, the issue should also apply to 2.7)

• it uses sys.stdin.encoding, shouldn't it rather use sys.stdin.encoding if anything?

Assuming the previous bug gets fixed, then no :-)

msg186576 - (view)

Author: Adam Bartoš (Drekin) *

Date: 2013-04-11 18:40

I encountered it when I changed sys.stdin at runtime (I thought it was a supported feature) to affect the interactive console, see http://bugs.python.org/issue1602 .

msg186580 - (view)

Author: Antoine Pitrou (pitrou) * (Python committer)

Date: 2013-04-11 18:56

Ok, I guess it would need a new API (PyRun_Stdio()?) to run the interactive loop from sys.stdin, rather than from a fixed FILE*.

msg193815 - (view)

Author: Adam Bartoš (Drekin) *

Date: 2013-07-28 10:40

Is there any chance the API will be added and used by python.exe?

msg221176 - (view)

Author: Alyssa Coghlan (ncoghlan) * (Python committer)

Date: 2014-06-21 12:29

Steve, another one to look at in the context of improving the Unicode handling situation at the Windows command prompt.

msg221179 - (view)

Author: Steve Dower (steve.dower) * (Python committer)

Date: 2014-06-21 15:15

Thanks Nick, but this has a pretty clear scope that may help the Unicode situation in cmd but doesn't directly relate to it.

msg223414 - (view)

Author: Adam Bartoš (Drekin) *

Date: 2014-07-18 15:09

There is still the serious inconsistency that the sys.stdin is not used for input by interactive loop but its encoding is. So if I replace sys.stdin with a custom object with its own encoding attribute, the standard interactive loop tries to use this encoding which may result in an exception on any input.

msg224312 - (view)

Author: Guido van Rossum (gvanrossum) * (Python committer)

Date: 2014-07-30 15:16

Is this at all related to the use of GNU readline?

msg224313 - (view)

Author: Antoine Pitrou (pitrou) * (Python committer)

Date: 2014-07-30 15:29

Yes, it is. GNU readline will use a FILE*. Apparently, one can customize this behaviour, see http://cnswww.cns.cwru.edu/php/chet/readline/readline.html#SEC25

"""Variable: rl_getc_func_t * rl_getc_function If non-zero, Readline will call indirectly through this pointer to get a character from the input stream. By default, it is set to rl_getc, the default Readline character input function (see section 2.4.8 Character Input). In general, an application that sets rl_getc_function should consider setting rl_input_available_hook as well. """

It is not obvious how that interacts with special keys, e.g. arrows.

msg224330 - (view)

Author: Guido van Rossum (gvanrossum) * (Python committer)

Date: 2014-07-30 17:49

I propose not to mess with GNU readline. But that doesn't mean we can't try to fix this issue by detecting that sys.stdin has changed and use it if it isn't referring to the original process stdin. It will be tricky however to make sure nothing breaks.

(The passage quoted from the GNU readline docs seems to imply that it's in non-blocking mode, and that the FD is a raw tty device, probably with echo off. It will give escape sequences for e.g. arrow keys.)

msg224334 - (view)

Author: Adam Bartoš (Drekin) *

Date: 2014-07-30 18:04

My naive picture of ideal situation looks like this: When the interactive loop wants input, it just calls sys.stdin.readline, which delegates to sys.stdin.buffer.raw.readinto or .read, these can use GNU readline if available to get the data. May I ask, what's wrong with my picture?

msg224338 - (view)

Author: Guido van Rossum (gvanrossum) * (Python committer)

Date: 2014-07-30 18:31

sys.stdin.readline() never delegates to GNU readline. The REPL calls GNU readline directly. There's clearly some condition that determines whether to call GNU readline or sys.stdin.readline, but it may not correspond to what you want (e.g. it may just test whether FD 0 is a tty). Can you find in the CPython source code where this determination is made?

msg224396 - (view)

Author: Adam Bartoš (Drekin) *

Date: 2014-07-31 11:38

I looked to the sourcecode and found the following.

First, the codepath of how interactive loop gets its input follows: Python/pythonrun.c:PyRun_InteractiveLoopFlags Python/pythonrun.c:PyRun_InteractiveOneObject Python/pythonrun.c:PyParser_ASTFromFileObject Parse/parsetok.c:PyParser_ParseFileObject Parse/parsetok.c:parsetok Parse/tokenizer.c:PyTokenizer_Get Parse/tokenizer.c:tok_get Parse/tokenizer.c:tok_nextc Parser/myreadline.c:PyOS_Readline OR Parse/tokenizer.c:decoding_fgets

PyRun_InteractiveOneObject tries to get the input encoding via sys.stdin.encoding. The encoding

is then passed along and finally stored in a tokenizer object. It is tok_nextc function that gets

the input. If the prompt is not NULL it gets the data via PyOS_Readline and uses the encoding to

recode it to UTF-8. This is unfortunate since the encoding, which originates in

sys.stdin.encoding, can have nothing to do with the data returned by PyOS_Readline. Αlso note

that there is hardcoded stdin argument to PyOS_Readline, but it probably holds tok->fp == stdin

so it doesn't matter.

If the prompt in tok_nextc is NULL then the data are gotten by decoding_fgets function, which

either use fp_readl > tok->decoding_readline or Objects/fileobject.c:Py_UniversalNewlineFgets

depending on tokenizer state. tok->decoding_readline handler may be set to io.open("isisOOO",

fileno(tok->fp), …) (I have no idea what "isisOOO" might be).

PyOS_Readline function either calls PyOS_StdioReadline or the function pointed to by

PyOS_ReadlineFunctionPointer which is by default again PyOS_StdioReadline, but usually is set to

support GNU readline by the code in Modules/readline.c. PyOS_StdioReadline function uses my_fgets

which calls fgets.

Now what input() function does. input is implemented as Python/bltinmodule.c:builtin_input. It

tests if we are on tty by comparing sys.stdin.fileno() to fileno(stdin) and testing isatty. Note

that this may not be enough – if I inslall a custom sys.stdin but let it have standard fileno

then the test may succeed. If we are tty then PyOS_Readline is used (and again together with

sys.std*.encoding), if we aren't then Objects/fileobject.c:PyFile_WriteObject > sys.stdout.write

(for prompt) and :PyFile_GetLine > sys.stdin.readline are used.

As we can see, the API is rather FILE* based. The only places where sys.std* objects are used are

in one branch of builtin_input, and when getting the encoding used in tokenizer. Could it be

possible to configure the tokenizer so it uses sys.stdin.readline for input, and also rewrite

builtin_input to allways use sys.std*? Then it would be sys.stdin.buffer.raw.read* methods'

responsibility to decide whether to use GNU readline or whatever PyOS_Readline uses or something

else (e.g. ReadConsoleW on Windows tty), and also check for Ctrl-C afterwards.

msg224397 - (view)

Author: Adam Bartoš (Drekin) *

Date: 2014-07-31 11:40

Sorry for formating in the previous message. Repeating…

I looked to the sourcecode and found the following.

First, the codepath of how interactive loop gets its input follows: Python/pythonrun.c:PyRun_InteractiveLoopFlags Python/pythonrun.c:PyRun_InteractiveOneObject Python/pythonrun.c:PyParser_ASTFromFileObject Parse/parsetok.c:PyParser_ParseFileObject Parse/parsetok.c:parsetok Parse/tokenizer.c:PyTokenizer_Get Parse/tokenizer.c:tok_get Parse/tokenizer.c:tok_nextc Parser/myreadline.c:PyOS_Readline OR Parse/tokenizer.c:decoding_fgets

PyRun_InteractiveOneObject tries to get the input encoding via sys.stdin.encoding. The encoding is then passed along and finally stored in a tokenizer object. It is tok_nextc function that gets the input. If the prompt is not NULL it gets the data via PyOS_Readline and uses the encoding to recode it to UTF-8. This is unfortunate since the encoding, which originates in sys.stdin.encoding, can have nothing to do with the data returned by PyOS_Readline. Αlso note that there is hardcoded stdin argument to PyOS_Readline, but it probably holds tok->fp == stdin so it doesn't matter.

If the prompt in tok_nextc is NULL then the data are gotten by decoding_fgets function, which either use fp_readl > tok->decoding_readline or Objects/fileobject.c:Py_UniversalNewlineFgets depending on tokenizer state. tok->decoding_readline handler may be set to io.open("isisOOO", fileno(tok->fp), …) (I have no idea what "isisOOO" might be).

PyOS_Readline function either calls PyOS_StdioReadline or the function pointed to by PyOS_ReadlineFunctionPointer which is by default again PyOS_StdioReadline, but usually is set to support GNU readline by the code in Modules/readline.c. PyOS_StdioReadline function uses my_fgets which calls fgets.

Now what input() function does. input is implemented as Python/bltinmodule.c:builtin_input. It tests if we are on tty by comparing sys.stdin.fileno() to fileno(stdin) and testing isatty. Note that this may not be enough – if I inslall a custom sys.stdin but let it have standard fileno then the test may succeed. If we are tty then PyOS_Readline is used (and again together with sys.std*.encoding), if we aren't then Objects/fileobject.c:PyFile_WriteObject > sys.stdout.write (for prompt) and :PyFile_GetLine > sys.stdin.readline are used.

As we can see, the API is rather FILE* based. The only places where sys.std* objects are used are in one branch of builtin_input, and when getting the encoding used in tokenizer. Could it be possible to configure the tokenizer so it uses sys.stdin.readline for input, and also rewrite builtin_input to allways use sys.std*? Then it would be sys.stdin.buffer.raw.read* methods' responsibility to decide whether to use GNU readline or whatever PyOS_Readline uses or something else (e.g. ReadConsoleW on Windows tty), and also check for Ctrl-C afterwards.

msg226021 - (view)

Author: Adam Bartoš (Drekin) *

Date: 2014-08-28 12:45

I have found another example of where the current interaction between readline and Python core lead to confussion. It started with following report on my package: https://github.com/Drekin/win-unicode-console/issues/2 .

Basically, IPython interactive console on Windows uses pyreadline package, which provides GNU readline functionality. To get input from user, it just calls input(prompt). Input calls readline both for writing prompt and reading the input. It interprets ANSI control sequences so colored prompt is displayed rather than garbage. And when user types, things like auto-completion work. sys.stdin is not used at all and points to standard object.

One easily gets the impression that since sys.stdin is bypassed, changing it doesn't mind, but it actually does. With changed sys.stdin, input() now uses it rather than readline and ANSI control sequences result in a mess. See https://github.com/ipython/ipython/issues/17#issuecomment-53696541 .

I just think that it would be better when input() allways delegated to sys.stdin and print() to sys.stdout() and this was the standard way to interact with terminal. It would then be the responsibility of sys.std* objects to do right thing – to read from file, to delegate to readline, to directly interact with console some way, to interpret or not the ANSI control sequences.

Solving issues like #1602 or #18597 or adding readline support to Windows would then be just matter of providing the right sys.std* implementation.

msg226098 - (view)

Author: Adam Bartoš (Drekin) *

Date: 2014-08-29 22:57

I realized that the behavior I want can be achieved by setting PyOS_ReadlineFunctionPointer to a function calling sys.stdin.readline(). However I found another problem: Python REPL just doesn't work, when sys.stdin.encoding is UTF-16-LE. The tokenizer (Parser/tokenizer.c:tok_nextc) reads a line using PyOS_Readline and then tries to recode it to UTF-8. The problem is that PyOS_Readline returns just plain *char and strlen() is used to determine its length when decoding, which makes no sense on UTF-16-LE encoded line, since it's full of nullbytes.

Why does PyOS_Readline return *char, rather than Python string object? In the situation when PyOS_ReadlineFunctionPointer points to something producing Unicode string (e.g. my new approach to solve #1602 or pyreadline package), it must be encoded and cast to *char to return from PyOS_Readline, then it is decoded by the tokenizer and again encoded to UTF-8.

msg226099 - (view)

Author: Antoine Pitrou (pitrou) * (Python committer)

Date: 2014-08-29 23:00

Why does PyOS_Readline return *char, rather than Python string object?

For historical reasons and now for compatibility: we can't change the hook's signature without breaking obvious applications, obviously. If necessary, we could add a new hook that would take precedence over the old one if defined. Feel free to post a patch for that.

msg226100 - (view)

Author: Antoine Pitrou (pitrou) * (Python committer)

Date: 2014-08-29 23:01

without breaking obvious applications

without breaking existing applications ;-)

msg226126 - (view)

Author: STINNER Victor (vstinner) * (Python committer)

Date: 2014-08-30 07:59

The Python parser works well with UTF8. If you know the encoding, decode from your encoding and encode to UTF8. You should pass the UTF8 flag to the parser.

msg226140 - (view)

Author: Adam Bartoš (Drekin) *

Date: 2014-08-30 15:30

Antoine Pitrou: I understand. It would be nice to have that new Python string based readline hook. Its default implementation could be to call PyOS_Readline and decode the bytes using sys.stdin.encoding (as the tokenizer currently does). Tokenizer then woudn't need to decode if it called the new hook.

Victor Stinner: I'm going to try the approach of reencoding my stream to UTF-8. So then my UTF-16-LE encoded stream is decoded, then encoded to UTF-8, interpreted as null-terminated *char, which is returned to the tokenizer, which again decodes it and encodes to UTF-8. I wonder if the last step could be short-circuited. What is this UTF8 flag to Python parser? I couldn't find any information.

msg226933 - (view)

Author: Adam Bartoš (Drekin) *

Date: 2014-09-15 19:04

I have found another problem. PyOS_Readline can be called from two different places – from Parser/tokenizer.c:tok_nextc (by REPL), which uses sys.stdin.encoding to encode prompt argument, and from Python/bltinmodule.c:builtin_input_impl (by input() function), which uses sys.stdout.encoding. So readline hook cannot be implemented correctly if sys.stdin and sys.stdout don't have the same encoding.

Either the tokenizer should have two encodings – one for input and one for output - or better no encoding at all and should use Python string based alternative to PyOS_Readline, which could be added.

msg234439 - (view)

Author: Adam Bartoš (Drekin) *

Date: 2015-01-21 15:45

Unfortunately, I have little or no experience with Python C code and I even don't have a C compiler installed so I cannot experiment. I'll just put my ideas how to solve this here.

• Add sys.readlinehook attribute, which can be set to a function taking a prompt string and returing a line. • Add C function PyOS_UnicodeReadline (possibly with a better name) which has the same signature as sys.readlinehook (in contrast with the signature of PyOS_Readline). If sys.readlinehook is set, call it; otherwise encode the prompt string using stdout encoding and delegate to PyOS_Readline and decode the string returned using stdin encoding. • Change the tokenizer and the implementation of input() so it uses PyOS_UnicodeReadline rather than PyOS_Readline.

This would solve the problem that utf-16 encoded string cannot be given to the tokenizer and also would bypass the silent assumption that stdin and stdout encodings are the same. Also, readline hook could be easily set from Python code – no need for ctypes. The package pyreadline could use this. Also, the issue #1602 could be then solved just by changing sys.std* streams and providing a trivial sys.readlinehook delegating to sys.stdout.write and sys.stdin.readline.

msg242173 - (view)

Author: Adam Bartoš (Drekin) *

Date: 2015-04-28 10:06

Note that under status quo PyOS_Readline is called from two places: the tokenizer during an interactive session and the builtin function input. The tokenizer passes promptstring encoded in sys.stdin.encoding while input() passes promtstring encoded in sys.stdout.encoding, so it is not possible to implement a readline hook correctly in the case the encodings are different. This might be considered a bug.

msg255549 - (view)

Author: Adam Bartoš (Drekin) *

Date: 2015-11-28 20:06

I've formulated a proposal regarding this issue: https://mail.python.org/pipermail/python-dev/2015-November/142246.html . Does it make sense?

msg272681 - (view)

Author: Steve Dower (steve.dower) * (Python committer)

Date: 2016-08-14 16:30

I'm working on this as part of my fix for . Not yet sure how this will come out - compatibility with GNU readline seems to be the biggest issue, as if we want to keep that then we can't allow embedded '\0' in the encoded text (i.e. UTF-16 cannot be used, which implies that sys.stdin.encoding cannot always be used directly).

Adding readlinehook as an alternative may be feasible, but a decent amount of work given how we call into the current readline implementation. Unfortunately, it looks like detecting when a readline hook has been added is going to involve significant changes to the tokenizer, which I really don't want to do.

The easiest approach wrt seems to be to special case the console by reencoding from utf-16-le to utf-8 and forcing the encoding in the tokenizer to utf-8 (instead of sys.stdin.encoding) in this case. I'll start here so that at least we can parse Unicode from the interactive prompt.

msg272683 - (view)

Author: Adam Bartoš (Drekin) *

Date: 2016-08-14 16:58

Unfortunately, it looks like detecting when a readline hook has been added is going to involve significant changes to the tokenizer, which I really don't want to do.

We don't need to detect the presence of readline hook, it may be so that there is always a readline hook. Whenever we have interactive stdio, and so PyOS_Readline is called, the new proposed API PyIO_Readline would be called instead. This would return Unicode str Py_Object*, so the result can be directly returned by input() and should be somehow encoded afterwards by the tokenizer (these are the only consumers of PyOS_Readline).

We may even leave the tokenizer alone and redefine PyOS_Readline as a wrapper of PyIO_Readline, having full control of the encoding process there. So it would be enough to set up the tokenizer with UTF-8 encoding despite the fact that sys.std*.encoding would be UTF-16.

(I hope that if the tokenizer was desiged nowdays, it would operate on strings rather than bytes so there won't be any encoding problems at all.)

Also, third parties would benefit from sys.readlinehook – at least win_unicode_console and pyreadline would just set the attribute rather than messing with ctypes.

msg275242 - (view)

Author: Steve Dower (steve.dower) * (Python committer)

Date: 2016-09-09 03:27

Unassigning this. I meant to close it with another fix, but that would be wrong as we really ought to keep this open until we solve it properly. All I've done is make it use the right APIs on Windows, but we still don't handle it properly when we change stdin.

History

Date

User

Action

Args

2022-04-11 14:57:43

admin

set

github: 61820

2021-03-15 21:02:22

eryksun

link

issue24829 dependencies

2021-03-15 21:01:39

eryksun

set

versions: + Python 3.8, Python 3.9, Python 3.10, - Python 3.6

2020-03-06 20:05:16

brett.cannon

set

nosy: - brett.cannon

2018-06-09 11:03:02

ncoghlan

unlink

issue22555 dependencies

2016-09-09 16:42:53

steve.dower

unlink

issue1602 dependencies

2016-09-09 03:27:34

steve.dower

set

assignee: steve.dower ->
messages: +

2016-08-14 16:58:20

Drekin

set

messages: +

2016-08-14 16:30:34

steve.dower

set

assignee: steve.dower
messages: +
versions: + Python 3.6, - Python 3.4

2015-11-28 20:06:01

Drekin

set

messages: +

2015-08-08 14:22:29

eryksun

link

issue12854 superseder

2015-05-15 22:42:49

vstinner

set

nosy: - vstinner

2015-05-12 05:09:39

ncoghlan

link

issue22555 dependencies

2015-05-11 08:07:25

ncoghlan

link

issue1602 dependencies

2015-05-10 14:48:39

paul.moore

set

nosy: + paul.moore

2015-04-28 10:06:35

Drekin

set

messages: +

2015-01-21 15:45:24

Drekin

set

messages: +

2014-09-15 19:04:15

Drekin

set

messages: +

2014-08-30 15:30:25

Drekin

set

messages: +

2014-08-30 07:59:55

vstinner

set

messages: +

2014-08-29 23:01:02

pitrou

set

messages: +

2014-08-29 23:00:25

pitrou

set

messages: +

2014-08-29 22:57:47

Drekin

set

messages: +

2014-08-28 12:45:18

Drekin

set

messages: +

2014-07-31 11:40:34

Drekin

set

messages: +

2014-07-31 11:38:42

Drekin

set

messages: +

2014-07-30 18:31:53

gvanrossum

set

messages: +

2014-07-30 18:04:22

Drekin

set

messages: +

2014-07-30 17:49:57

gvanrossum

set

messages: +

2014-07-30 15:29:16

pitrou

set

messages: +

2014-07-30 15:16:12

gvanrossum

set

nosy: + gvanrossum
messages: +

2014-07-18 15:09:34

Drekin

set

messages: +

2014-06-21 15:15:37

steve.dower

set

nosy:brett.cannon, georg.brandl, ncoghlan, pitrou, vstinner, benjamin.peterson, eric.araujo, tshepang, Drekin, steve.dower
messages: +

2014-06-21 12:29:27

ncoghlan

set

nosy: + steve.dower
messages: +

2013-07-28 10:40:56

Drekin

set

messages: +

2013-04-11 18:56:48

pitrou

set

type: behavior -> enhancement
stage: needs patch
messages: +
versions: - Python 3.3

2013-04-11 18:40:26

Drekin

set

messages: +
versions: - Python 2.7

2013-04-11 09:56:36

pitrou

set

nosy: + brett.cannon, georg.brandl, ncoghlan, benjamin.peterson

messages: +
versions: + Python 2.7

2013-04-10 14:40:15

ezio.melotti

set

nosy: + vstinner

2013-04-06 23🔞07

pitrou

set

assignee: pitrou -> (no value)

2013-04-06 10:21:46

georg.brandl

set

assignee: pitrou

nosy: + pitrou

2013-04-06 09:41:21

Drekin

set

messages: +

2013-04-05 21:14:56

tshepang

set

nosy: + tshepang

2013-04-03 01:19:41

eric.araujo

set

nosy: + eric.araujo

2013-04-02 16:59:14

Drekin

create