Issue 5210: zlib does not indicate end of compressed stream properly (original) (raw)

Created on 2009-02-10 19:46 by travis, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
zlibmodule.diff	travis,2009-02-12 17:00
zlib_finished_test.txt	solinym,2009-08-19 21:39	patch to test for end-of-compressed-stream indicator
zlibmodule.c.diff	solinym,2009-08-21 16:39	diff to zlibmodule.c
test_zlib.py.diff	solinym,2009-08-21 16:41	diff to test_zlib.py
test_zlib.py.diff	solinym,2009-08-21 20:07	complete version of diff to test_zlib.py

Messages (12)
msg81590 - (view)	Author: Travis Hassloch (travis)	Date: 2009-02-10 19:46
Underlying zlib can determine when it has hit the end of a compressed stream without reading past the end. Python zlib implementation requires that one read past the end before it signals the end by putting data in Decompress.unused_data. This complicates interfacing with mixed compressed/uncompressed streams.
msg81780 - (view)	Author: Travis Hassloch (travis)	Date: 2009-02-12 17:00
Here is a patch which adds a member called is_finished to decompression objects that allows client code to know when it has reached the end of the compressed stream.
msg90523 - (view)	Author: Ezio Melotti (ezio.melotti) *	Date: 2009-07-14 22:41
Thanks for the patch! Can you provide tests too?
msg90817 - (view)	Author: Travis H. (solinym)	Date: 2009-07-22 15:58
What kind of tests did you have in mind? Unit tests in python, or something else?
msg90820 - (view)	Author: Ezio Melotti (ezio.melotti) *	Date: 2009-07-22 20:35
Yes, I think that the right place where to add the tests is Lib/test/test_zlib.py
msg91749 - (view)	Author: Travis H. (solinym)	Date: 2009-08-19 21:39
Attaching unit test diff Output of "diff -u test_zlib.py~ test_zlib.py"
msg91757 - (view)	Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) *	Date: 2009-08-20 00:09
Some comments about the patch: - In zlibmodule.c, the is_finished member should be an int, and converted to a PyObject only when requested. - The test should check that is_finished is False one byte before the compressed part, and becomes True when the decompressor reads the last compressed byte. I don't think that dco.flush() is necessary for the test. - Also, the last check could be more precise: assertEquals(y1 + y2, HAMLET_SCENE) and assertEquals(dco.unused_data, HAMLET_SCENE)
msg91832 - (view)	Author: Travis H. (solinym)	Date: 2009-08-21 16:39
zlibmodule.c.diff Implements all the suggested features, but I'm not exactly sure whether it handles reference counts properly.
msg91833 - (view)	Author: Travis H. (solinym)	Date: 2009-08-21 16:41
Diff to tests Implements all suggested changes save one: I wasn't sure how to test that is_finished is clear one byte before the end of the compressed section. Instead, I test that it is clear before I call the compression routine.
msg91840 - (view)	Author: Travis H. (solinym)	Date: 2009-08-21 20:07
Figured out how to test is_finished attribute of the zlib module properly.
msg91846 - (view)	Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) *	Date: 2009-08-21 21:28
Hm, I tried a modified version of your first test, and I found another problem with the current zlib library; starting with the input: x = x1 + x2 + HAMLET_SCENE # both compressed and uncompressed data The following scenario is OK: dco.decompress(x) # returns HAMLET_SCENE dco.unused_data # returns HAMLET_SCENE But this one: for c in x: dco.decompress(x) # will return HAMLET_SCENE, in several pieces dco.unused_data # only one character, the last of (c in x)! This is a bug IMO: unused_data should accumulate all the extra uncompressed data.
msg174057 - (view)	Author: Nadeem Vawda (nadeem.vawda) *	Date: 2012-10-28 16:26
This bug (zlib not providing a way to detect end-of-stream) has already been fixed - see issue 12646. I've opened issue 16350 for the unused_data problem.

History
Date	User	Action	Args
2022-04-11 14:56:45	admin	set	github: 49460
2012-10-28 16:26:20	nadeem.vawda	set	status: open -> closedsuperseder: zlib.Decompress.decompress/flush do not raise any exceptions when given truncated input streamsmessages: + resolution: out of datestage: test needed -> resolved
2012-01-26 13:03:45	nadeem.vawda	set	nosy: + nadeem.vawda
2009-08-21 21:28:26	amaury.forgeotdarc	set	messages: +
2009-08-21 20:07:56	solinym	set	files: + test_zlib.py.diffmessages: +
2009-08-21 16:41:08	solinym	set	files: + test_zlib.py.diffmessages: +
2009-08-21 16:39:53	solinym	set	files: + zlibmodule.c.diffmessages: +
2009-08-20 00:09:17	amaury.forgeotdarc	set	nosy: + amaury.forgeotdarcmessages: +
2009-08-19 21:39:51	solinym	set	files: + zlib_finished_test.txtmessages: +
2009-07-22 20:35:20	ezio.melotti	set	messages: +
2009-07-22 15:58:31	solinym	set	nosy: + solinymmessages: +
2009-07-14 22:41:20	ezio.melotti	set	priority: normalversions: + Python 2.7, Python 3.2, - Python 3.0nosy: + ezio.melottimessages: + stage: test needed
2009-07-14 22:38:08	ezio.melotti	link	issue6485 superseder
2009-02-12 17:00:08	travis	set	files: + zlibmodule.diffkeywords: + patchmessages: +
2009-02-10 19:46:19	travis	create