Issue 5210: zlib does not indicate end of compressed stream properly (original) (raw)

Created on 2009-02-10 19:46 by travis, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
zlibmodule.diff travis,2009-02-12 17:00
zlib_finished_test.txt solinym,2009-08-19 21:39 patch to test for end-of-compressed-stream indicator
zlibmodule.c.diff solinym,2009-08-21 16:39 diff to zlibmodule.c
test_zlib.py.diff solinym,2009-08-21 16:41 diff to test_zlib.py
test_zlib.py.diff solinym,2009-08-21 20:07 complete version of diff to test_zlib.py
Messages (12)
msg81590 - (view) Author: Travis Hassloch (travis) Date: 2009-02-10 19:46
Underlying zlib can determine when it has hit the end of a compressed stream without reading past the end. Python zlib implementation requires that one read past the end before it signals the end by putting data in Decompress.unused_data. This complicates interfacing with mixed compressed/uncompressed streams.
msg81780 - (view) Author: Travis Hassloch (travis) Date: 2009-02-12 17:00
Here is a patch which adds a member called is_finished to decompression objects that allows client code to know when it has reached the end of the compressed stream.
msg90523 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2009-07-14 22:41
Thanks for the patch! Can you provide tests too?
msg90817 - (view) Author: Travis H. (solinym) Date: 2009-07-22 15:58
What kind of tests did you have in mind? Unit tests in python, or something else?
msg90820 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2009-07-22 20:35
Yes, I think that the right place where to add the tests is Lib/test/test_zlib.py
msg91749 - (view) Author: Travis H. (solinym) Date: 2009-08-19 21:39
Attaching unit test diff Output of "diff -u test_zlib.py~ test_zlib.py"
msg91757 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2009-08-20 00:09
Some comments about the patch: - In zlibmodule.c, the is_finished member should be an int, and converted to a PyObject only when requested. - The test should check that is_finished is False one byte before the compressed part, and becomes True when the decompressor reads the last compressed byte. I don't think that dco.flush() is necessary for the test. - Also, the last check could be more precise: assertEquals(y1 + y2, HAMLET_SCENE) and assertEquals(dco.unused_data, HAMLET_SCENE)
msg91832 - (view) Author: Travis H. (solinym) Date: 2009-08-21 16:39
zlibmodule.c.diff Implements all the suggested features, but I'm not exactly sure whether it handles reference counts properly.
msg91833 - (view) Author: Travis H. (solinym) Date: 2009-08-21 16:41
Diff to tests Implements all suggested changes save one: I wasn't sure how to test that is_finished is clear one byte before the end of the compressed section. Instead, I test that it is clear before I call the compression routine.
msg91840 - (view) Author: Travis H. (solinym) Date: 2009-08-21 20:07
Figured out how to test is_finished attribute of the zlib module properly.
msg91846 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2009-08-21 21:28
Hm, I tried a modified version of your first test, and I found another problem with the current zlib library; starting with the input: x = x1 + x2 + HAMLET_SCENE # both compressed and uncompressed data The following scenario is OK: dco.decompress(x) # returns HAMLET_SCENE dco.unused_data # returns HAMLET_SCENE But this one: for c in x: dco.decompress(x) # will return HAMLET_SCENE, in several pieces dco.unused_data # only one character, the last of (c in x)! This is a bug IMO: unused_data should accumulate all the extra uncompressed data.
msg174057 - (view) Author: Nadeem Vawda (nadeem.vawda) * (Python committer) Date: 2012-10-28 16:26
This bug (zlib not providing a way to detect end-of-stream) has already been fixed - see issue 12646. I've opened issue 16350 for the unused_data problem.
History
Date User Action Args
2022-04-11 14:56:45 admin set github: 49460
2012-10-28 16:26:20 nadeem.vawda set status: open -> closedsuperseder: zlib.Decompress.decompress/flush do not raise any exceptions when given truncated input streamsmessages: + resolution: out of datestage: test needed -> resolved
2012-01-26 13:03:45 nadeem.vawda set nosy: + nadeem.vawda
2009-08-21 21:28:26 amaury.forgeotdarc set messages: +
2009-08-21 20:07:56 solinym set files: + test_zlib.py.diffmessages: +
2009-08-21 16:41:08 solinym set files: + test_zlib.py.diffmessages: +
2009-08-21 16:39:53 solinym set files: + zlibmodule.c.diffmessages: +
2009-08-20 00:09:17 amaury.forgeotdarc set nosy: + amaury.forgeotdarcmessages: +
2009-08-19 21:39:51 solinym set files: + zlib_finished_test.txtmessages: +
2009-07-22 20:35:20 ezio.melotti set messages: +
2009-07-22 15:58:31 solinym set nosy: + solinymmessages: +
2009-07-14 22:41:20 ezio.melotti set priority: normalversions: + Python 2.7, Python 3.2, - Python 3.0nosy: + ezio.melottimessages: + stage: test needed
2009-07-14 22:38:08 ezio.melotti link issue6485 superseder
2009-02-12 17:00:08 travis set files: + zlibmodule.diffkeywords: + patchmessages: +
2009-02-10 19:46:19 travis create