# Bug demo TEXT_LINES = [ b'cutecat\n', b'promiscuousbonobo\n', ] TEXT = b''.join(TEXT_LINES) import bz2 filename = '/tmp/demo.bz2' with open(filename, 'wb') as f: f.write(bz2.compress(TEXT)) with bz2.BZ2File(filename) as bz2f: pdata = bz2f.peek(n=7) print(pdata) It outputs b'cutecat\npromiscuousbonobo\n', not b'cutecat'. Here is the patch to fix the bug.
This is documented behavior. .. method:: peek([n]) Return buffered data without advancing the file position. At least one byte of data will be returned (unless at EOF). The exact number of bytes returned is unspecified.
Because it is unspecified in io.BufferedReader.peek() and in many classes implemented the io.BufferedReader interface. .. method:: peek([size]) Return bytes from the stream without advancing the position. At most one single read on the raw stream is done to satisfy the call. The number of bytes returned may be less or more than requested. I agree that this is weird, but this is a much larger issue than just bz2. We can't just "fix" this for bz2. This worths a discussion on Python-Dev.
History
Date
User
Action
Args
2022-04-11 14:57:59
admin
set
github: 65055
2014-03-06 10:31:43
serhiy.storchaka
set
messages: +
2014-03-06 10:05:12
vajrasky
set
messages: +
2014-03-06 09:13:56
serhiy.storchaka
set
status: open -> closednosy: + serhiy.storchakamessages: + resolution: not a bugstage: resolved