Issue 5811: io.BufferedReader.peek(): Documentation differs from Implementation (original) (raw)

Created on 2009-04-22 06:15 by trott, last changed 2022-04-11 14:56 by admin.

Files
File name	Uploaded	Description	Edit
peek.diff	conf,2009-06-13 04:20
peek2.diff	conf,2009-06-13 04:23
peek3.diff	conf,2009-06-15 05:21
peek4.diff	conf,2009-06-15 13:02
peek-one-byte.patch	martin.panter,2015-01-10 04:26	Document ≥ 1 byte returned	review

Messages (25)
msg86274 - (view)	Author: Torsten Rottmann (trott)	Date: 2009-04-22 06:15
The documented behavior of io.BufferedReader.peek([n]) states: peek([n]) Return 1 (or n if specified) bytes from a buffer without advancing the position. Thereas the parameter n is the _max_ length of returned bytes. Implementation is: def _peek_unlocked(self, n=0): want = min(n, self.buffer_size) have = len(self._read_buf) - self._read_pos if have < want: to_read = self.buffer_size - have current = self.raw.read(to_read) if current: self._read_buf = self._read_buf[self._read_pos:] + current self._read_pos = 0 return self._read_buf[self._read_pos:] Where you may see that the parameter n is the _min_ length of returned bytes. So peek(1) will return _not_ just 1 Byte, but the remaining bytes in the buffer.
msg86275 - (view)	Author: Torsten Rottmann (trott)	Date: 2009-04-22 06:20
Note: this is also in Python 2.6
msg86276 - (view)	Author: Torsten Rottmann (trott)	Date: 2009-04-22 06:29
Proposed patch to fix this: set the default of n to 1 as stated by docs: def _peek_unlocked(self, n=1): return n bytes: return self._read_buf[self._read_pos:self._read_pos+n]
msg89311 - (view)	Author: Alyssa Coghlan (ncoghlan) *	Date: 2009-06-13 02:27
Assigned to Benjamin for assessment - this should be considered for rc2 since it's still broken in 3.1: >>> f = open('setup.py', 'rb') >>> len(f.peek(10)) 4096 >>> len(f.peek(1)) 4096 >>> len(f.peek(4095)) 4096 >>> len(f.peek(10095)) 4096 Brought up on python-dev in this thread: http://mail.python.org/pipermail/python-dev/2009-June/089986.html And previously here: http://mail.python.org/pipermail/python-dev/2009-April/088229.html The thread from April suggests the current behaviour may be intentional, in which case it is the documentation that needs to be fixed, as it is currently not just misleading but flat out wrong. Then again, Benjamin's initial response to that thread was to support the idea of changing peek() so that the argument actually was a cap. The previous documentation that Alexandre quotes in the April was changed to the current description in late April without any corresponding change to the implementation: http://svn.python.org/view/python/branches/py3k/Doc/library/io.rst?r1=62422&r2=62430 However, the old description was also wrong for the io-c implementation since it just returns the current buffered data from peek, no matter what argument you pass in.
msg89312 - (view)	Author: Benjamin Peterson (benjamin.peterson) *	Date: 2009-06-13 03:38
I think the argument should be used as a upper bound; I will look at this tomorrow.
msg89313 - (view)	Author: Lucas Prado Melo (conf)	Date: 2009-06-13 04:20
Hey guys, I did a patch about this one. I didn't do many tests but I guess it is ok (it works like I think it should). What do you think?
msg89314 - (view)	Author: Lucas Prado Melo (conf)	Date: 2009-06-13 04:23
Oops I overlooked I minor flaw. A second version.
msg89315 - (view)	Author: Lucas Prado Melo (conf)	Date: 2009-06-13 04:48
There's a problem with my patch... When the size of the data we want to peek is too big ( > buffer_len - start ) the cursor will move, thus there isn't a case where the peek function would work properly (except when we want to peek() just 1 byte). Couldn't we use a read() followed by a seek() instead?
msg89324 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2009-06-13 12:21
Lucas, it is indeed impossible for peek() to return more than the buffer size and remain compatible with non-seekable raw streams. That's why it /never/ returns more than the buffer size. As for the fact that peek() doesn't behave as documented, I disagree. Here is what the docstring says: """Returns buffered bytes without advancing the position. The argument indicates a desired minimal number of bytes; we do at most one raw read to satisfy it. We never return more than self.buffer_size. """ Please note : "a desired /minimal/ number of bytes" (minimal, not maximal). Furthermore, "We never return more than self.buffer_size." The behaviour looks ok to me.
msg89326 - (view)	Author: Lucas Prado Melo (conf)	Date: 2009-06-13 12:37
We could fill the buffer while moving its start point to 0. I guess this behavior would require a new function (or a new parameter to Modules/_io/bufferedio.c:_bufferedreader_fill_buffer() ). If you are ok with that I could write a patch.
msg89327 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2009-06-13 12:40
We could, however, enforce the passed argument and only return the whole remaining buffer when no argument is given. This is a bit like Frederick Reeve's proposal on python-dev, but less sophisticated and therefore less tedious to implement. In any case, I'm not sure it should be committed before the 3.1 release. The second and last release candidate is supposed to be today.
msg89328 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2009-06-13 12:47
> We could fill the buffer while moving its start point to 0. I guess this > behavior would require a new function (or a new parameter to > Modules/_io/bufferedio.c:_bufferedreader_fill_buffer() ). > If you are ok with that I could write a patch. The buffer is used for both reading and writing and you have to be careful when shifting it. Besides, the same change (or similar) should also be done in the Python implementation (in _pyio.py). If you come up with a patch, please add some tests and check the whole regression suite passes.
msg89329 - (view)	Author: Benjamin Peterson (benjamin.peterson) *	Date: 2009-06-13 12:57
I'm downgrading this because it can't be changed until after 3.1 is released.
msg89340 - (view)	Author: Alyssa Coghlan (ncoghlan) *	Date: 2009-06-14 01:12
It's not the docstring that is wrong for the current behaviour, it's the IO.BufferedReader documentation: """ peek([n]) Return 1 (or n if specified) bytes from a buffer without advancing the position. Only a single read on the raw stream is done to satisfy the call. The number of bytes returned may be less than requested since at most all the buffer’s bytes from the current position to the end are returned. """ That gives absolutely no indication that the call might return more bytes than expected, and the indication that leaving out the argument will return only the next byte is flat out wrong.
msg89349 - (view)	Author: Benjamin Peterson (benjamin.peterson) *	Date: 2009-06-14 14:38
I updated the documentation in r73429. Is that better?
msg89366 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2009-06-14 19:32
Rather than "only a single read on the raw stream", it should be "at most a single read on the raw stream", IMHO.
msg89388 - (view)	Author: Lucas Prado Melo (conf)	Date: 2009-06-15 05:21
Here is a patch that passes all the tests (I had to change some of them though, they were expecting erroneous behaviours IMHO). The biggest problem was the read1 testing, I've tried to get the maximum of bytes less than or equal to what the user wanted while executing at most 1 raw_read()'s. I have created a new test for peek()'ing a number of bytes bigger than could possibly be stored on the buffer.
msg89389 - (view)	Author: Lucas Prado Melo (conf)	Date: 2009-06-15 05:22
Here, it's a patch that passes all the tests (I had to change some of them though, they were expecting erroneous behaviours IMHO). The biggest problem was the read1 testing, I've tried to get the maximum of bytes less than or equal to what the user wanted while executing at most 1 raw_read()'s. I have created a new test for peek()'ing a number of bytes bigger than could possibly be stored on the buffer.
msg89399 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2009-06-15 10:37
I haven't read the patch in detail but I don't think you should have changed read1(). read1() is there for optimization purposes and its current behaviour makes sense IMO.
msg89400 - (view)	Author: Alyssa Coghlan (ncoghlan) *	Date: 2009-06-15 10:39
The doc revision definitely does a better job of characterising the current underspecified behaviour :) I agree with Antoine that "at most a single read" would be better wording.
msg89401 - (view)	Author: Lucas Prado Melo (conf)	Date: 2009-06-15 13:02
Ok A new patch without read1() changes. Only one test fails, a read1() test: ====================================================================== FAIL: test_read1 (test.test_io.PyBufferedRWPairTest) ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/lucas/Codes/python-stuff/py3k/Lib/test/test_io.py", line 1139, in test_read1 self.assertEqual(pair.read1(3), b"abc") AssertionError: b'a' != b'abc' Since I've changed peek_unlocked() (which is used once by read1()), I guess there's a problem with read1() expectations about it.
msg189599 - (view)	Author: Mark Lawrence (BreamoreBoy) *	Date: 2013-05-19 15:19
Looks like this has slipped under the radar. I'll leave working on it to the experts :)
msg233750 - (view)	Author: Martin Panter (martin.panter) *	Date: 2015-01-09 12:59
Is the current documentation as accurate as it can be? “The number of bytes returned may be less or more than requested” To me this has always made this method practically useless. A valid implementation could just always return b"". I noticed the BZ2File.peek() documentation (BZ2File is apparently trying to imitate BufferedReader) is slightly more useful: “At least one byte of data will be returned (unless at EOF)” That could be used for (say) peeking for a LF following a CR. But still the “size” parameter does not seem very useful. In fact, LZMAFile.peek() says the size is ignored.
msg233801 - (view)	Author: Martin Panter (martin.panter) *	Date: 2015-01-10 04:26
Here is a simple documentation patch to guarantee that at least one byte is normally returned. This would make the method much more useful, and compatible with the BZ2File and LZMAFile interfaces, allowing them to use BufferedReader, as I propose to do in Issue 15955. Even if nobody is interested in Torsten’s patch to limit the return length, I suggest my patch be considered :)
msg235024 - (view)	Author: Martin Panter (martin.panter) *	Date: 2015-01-30 06:48
The non-blocking behaviour that I documented in my patch is under question in Issue 13322. I think it would be nice change the implementation to either return None or raise BlockingIOError.

History
Date	User	Action	Args
2022-04-11 14:56:48	admin	set	github: 50061
2020-09-21 18:05:58	desbma	set	nosy: + desbma
2015-01-30 06:48:55	martin.panter	set	messages: +
2015-01-10 04:26:53	martin.panter	set	files: + peek-one-byte.patchnosy: + docs@pythonmessages: + assignee: docs@pythoncomponents: + Documentation
2015-01-09 12:59:20	martin.panter	set	nosy: + martin.pantermessages: + versions: + Python 3.4
2014-02-03 18:29:40	BreamoreBoy	set	nosy: - BreamoreBoy
2013-05-19 15:19:52	BreamoreBoy	set	nosy: + BreamoreBoymessages: +
2009-06-15 13:02:29	conf	set	files: + peek4.diffmessages: +
2009-06-15 10:39:12	ncoghlan	set	messages: +
2009-06-15 10:37:07	pitrou	set	messages: + versions: + Python 2.7
2009-06-15 05:22:02	conf	set	messages: +
2009-06-15 05:21:10	conf	set	files: + peek3.diffmessages: +
2009-06-14 19:32:54	pitrou	set	messages: +
2009-06-14 14:38:07	benjamin.peterson	set	messages: + versions: + Python 3.2, - Python 3.0
2009-06-14 01:12:27	ncoghlan	set	messages: +
2009-06-13 12:57:05	benjamin.peterson	set	priority: release blocker -> normalassignee: benjamin.peterson -> (no value)messages: +
2009-06-13 12:47:34	pitrou	set	messages: +
2009-06-13 12:40:17	pitrou	set	messages: +
2009-06-13 12:37:37	conf	set	messages: +
2009-06-13 12:21:50	pitrou	set	nosy: + pitroumessages: +
2009-06-13 04:48:45	conf	set	messages: +
2009-06-13 04:23:11	conf	set	files: + peek2.diffmessages: +
2009-06-13 04:20:57	conf	set	files: + peek.diffnosy: + confmessages: + keywords: + patch
2009-06-13 03:38:35	benjamin.peterson	set	messages: +
2009-06-13 02:27:13	ncoghlan	set	priority: release blockernosy: + ncoghlan, benjamin.petersonmessages: + assignee: benjamin.peterson
2009-04-22 06:29:14	trott	set	messages: +
2009-04-22 06:20:09	trott	set	messages: +
2009-04-22 06:15:52	trott	create