Issue 5323: document expected/required behavior of 3.x io subsystem with respect to buffering (original) (raw)

Created on 2009-02-19 21:36 by r.david.murray, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Messages (6)
msg82499 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2009-02-19 21:36
The python 3.x io library appears to allow mixing calls to __next__ and calls to readline. As another consequence, where previously it was necessary to use 'readline' to read single lines from a pipe without blocking waiting for a buffer fill, now one can apparently use 'for'. Reading the code in io.py, it seems as though this is intentional: readline and __next__ use the same buffer, and the method on TextIOWrapper that fills the buffer calls 'read1', which in the one place where it has a docstring says that only as much data as is available is read, in contrast to 'read', which reads exactly n bytes (to EOF). Assuming this is the intended behavior, it should be documented, and if they do not exist (I didn't see any) tests should be added to confirm the behavior. If this behavior is an implementation artifact, then this should be documented so that code is less likely to depend on it. Once I know whether or not this behavior _should_ be part of the test suite, I'll be happy to work on doc and test patches.
msg82555 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2009-02-20 23:05
Yes, this behavior is documented. There is one test that mixes iteration and readline, in test.test_io.TextIOWrapperTest.testSeeking, but it shows a difference between the two methods: to be faster, __next__ disables the tell() function.
msg82559 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2009-02-21 05:01
Heh, I was reading the code instead of the documentation. Silly me :). The 'buffering' argument of open, and the 'line_buffering' argument of TextIOWrapper do address my documentation concern about whether or not readline can be used to read lines from a source without blocking as long as at least one line is available. However, I do not see any documentation of the relationship between readline and next, except the negative one that no restriction on mixing them is documented. Since there used to be such a restriction, noting that there isn't one is probably worthwhile. Or perhaps better, a specification that '__next__' calls 'readline'. (And the behavior difference with respect to tell, which I don't fully understand, should also be documented.) TextIOWrapper says it wraps a BufferedIOBase object. The doc for that class talks about read possibly doing readahead for a 'non-interactive' file. What about pipes? They are not mentioned, and it would be good to have short reads on pipes for the same reason that it is good to have short reads on ttys. The IOBase only specifies an 'isatty' method. It took me a while to understand what was going on well enough to write the above paragraphs, partly because I read the code first :). I'm finding the documentation confusing, so I might as well talk about the issues I ran into. The implementation of TextIOWrapper calls the buffer's 'read1' method in _read_chunk. But BufferedIOBase does not define read1, nor does its base class, IOBase. open does pass TextIOWrapper a BufferedReader object which does define read1, which is why it works, but I don't see any documentation saying that the buffer argument to TextIOWrapper must provide a 'read1' method with certain semantics. The docs indicate BufferedIOBase is the minimum requirement. Is the definition of 'read1' missing from BufferedIOBase? I also notice that while open passes TextIOWrapper an appropriate value for line_buffering (assuming we ignore the pipe issue), TextIOWrapper ignores it for reading. I suppose that that is an implementation detail only. After staring at it for a while, I finally came to an understanding of the statement in BufferedIOBase that "A typical implementation should not inherit from a RawIOBase implementation, but wrap one like BufferedWriter and BufferedReader." I understand this to mean that BufferedIOBase is really an ABC and should be concretized as a wrapper rather than doing multiple inheritance with RawIOBase. I don't think I would have figured that out without reading the code, though. I believe that adding the word 'do' at the end of that sentence would make the meaning clear. A similarly confusing bit is the first sentence of the documentation of TextIOWrapper. Currently it says "A buffered text stream over a BufferedIOBase raw stream..." I think it would be clearer if it said something like "A buffered text stream wrapper around a BufferedIOBase derived wrapper around a raw stream..." That's a bit unwieldy, but it is nontheless (IMO) more comprehensible. I will look at the tests more carefully and make sure that all my use cases are in fact covered. --RDM
msg82565 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2009-02-21 13:14
RDM, all the classes you mentioned should indeed be able to do "short reads" on pipes, sockets and the like. That's how they are tested in test_io.py: against mock raw i/o classes which only return a few bytes at a time (e.g. only 5 bytes will be filled in a 4096-byte buffer). However, I encourage you once again to *experiment* with the 3.x i.o library and share your results with us. This is the best way for us to know whether common use cases are really covered. (Amaury, as for the `telling` flag in TextIOWrapper, we should so some performance measurements, and then suppress it if there's no difference)
msg82568 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2009-02-21 14:43
On Sat, 21 Feb 2009 at 13:14, Antoine Pitrou wrote: > Antoine Pitrou <pitrou@free.fr> added the comment: > > RDM, all the classes you mentioned should indeed be able to do "short > reads" on pipes, sockets and the like. That's how they are tested in > test_io.py: against mock raw i/o classes which only return a few bytes > at a time (e.g. only 5 bytes will be filled in a 4096-byte buffer). My questions in the last comment were directed at trying to clarify the documentation. I think my most important point there is whether or not 'read1' should be added to the BufferedIOBase ABI. I believe it should be, since if a class derived from BufferedIOBase does not implement it and is passed to TextIOWrapper, it will fail. > However, I encourage you once again to *experiment* with the 3.x i.o > library and share your results with us. This is the best way for us to > know whether common use cases are really covered. As I said, I plan to do so. I needed to understand the intent first, though, and reading the docs resulted in some doc questions. Should I be opening each point in a separate issue and/or providing a suggested doc patch? I'm new to trying to help out via the tracker, so best practice pointers are welcome. --RDM
msg83232 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2009-03-06 01:24
I have now had time to test this. My use case (reading text from a pipe and wanting the most recent complete line returned immediatly as it is avaialable) is satisified by both the io.py code and the io c code from the py3k branch. I haven't been able to figure out how to write an automated test, though :(. I also had a fun time (I mean that literally) wandering through the code and docs convincing myself that the 3.1a0 docs cover all the issues I raised in this ticket. I played around with a test case that helped convince me the implementations match the 3.1a0 docs for the issues I was concerned about. So, IMO this ticket can be closed as fixed in 3.1a0. --RDM
History
Date User Action Args
2022-04-11 14:56:45 admin set github: 49573
2009-03-06 01:34:49 benjamin.peterson set status: open -> closedresolution: fixed
2009-03-06 01:24:06 r.david.murray set messages: +
2009-02-21 14:43:16 r.david.murray set messages: +
2009-02-21 13:14:13 pitrou set nosy: + pitroumessages: +
2009-02-21 05:01:46 r.david.murray set messages: +
2009-02-20 23:05:54 amaury.forgeotdarc set nosy: + amaury.forgeotdarcmessages: +
2009-02-19 21:36:08 r.david.murray create