msg81590 - (view) |
Author: Travis Hassloch (travis) |
Date: 2009-02-10 19:46 |
Underlying zlib can determine when it has hit the end of a compressed stream without reading past the end. Python zlib implementation requires that one read past the end before it signals the end by putting data in Decompress.unused_data. This complicates interfacing with mixed compressed/uncompressed streams. |
|
|
msg81780 - (view) |
Author: Travis Hassloch (travis) |
Date: 2009-02-12 17:00 |
Here is a patch which adds a member called is_finished to decompression objects that allows client code to know when it has reached the end of the compressed stream. |
|
|
msg90523 - (view) |
Author: Ezio Melotti (ezio.melotti) *  |
Date: 2009-07-14 22:41 |
Thanks for the patch! Can you provide tests too? |
|
|
msg90817 - (view) |
Author: Travis H. (solinym) |
Date: 2009-07-22 15:58 |
What kind of tests did you have in mind? Unit tests in python, or something else? |
|
|
msg90820 - (view) |
Author: Ezio Melotti (ezio.melotti) *  |
Date: 2009-07-22 20:35 |
Yes, I think that the right place where to add the tests is Lib/test/test_zlib.py |
|
|
msg91749 - (view) |
Author: Travis H. (solinym) |
Date: 2009-08-19 21:39 |
Attaching unit test diff Output of "diff -u test_zlib.py~ test_zlib.py" |
|
|
msg91757 - (view) |
Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) *  |
Date: 2009-08-20 00:09 |
Some comments about the patch: - In zlibmodule.c, the is_finished member should be an int, and converted to a PyObject only when requested. - The test should check that is_finished is False one byte before the compressed part, and becomes True when the decompressor reads the last compressed byte. I don't think that dco.flush() is necessary for the test. - Also, the last check could be more precise: assertEquals(y1 + y2, HAMLET_SCENE) and assertEquals(dco.unused_data, HAMLET_SCENE) |
|
|
msg91832 - (view) |
Author: Travis H. (solinym) |
Date: 2009-08-21 16:39 |
zlibmodule.c.diff Implements all the suggested features, but I'm not exactly sure whether it handles reference counts properly. |
|
|
msg91833 - (view) |
Author: Travis H. (solinym) |
Date: 2009-08-21 16:41 |
Diff to tests Implements all suggested changes save one: I wasn't sure how to test that is_finished is clear one byte before the end of the compressed section. Instead, I test that it is clear before I call the compression routine. |
|
|
msg91840 - (view) |
Author: Travis H. (solinym) |
Date: 2009-08-21 20:07 |
Figured out how to test is_finished attribute of the zlib module properly. |
|
|
msg91846 - (view) |
Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) *  |
Date: 2009-08-21 21:28 |
Hm, I tried a modified version of your first test, and I found another problem with the current zlib library; starting with the input: x = x1 + x2 + HAMLET_SCENE # both compressed and uncompressed data The following scenario is OK: dco.decompress(x) # returns HAMLET_SCENE dco.unused_data # returns HAMLET_SCENE But this one: for c in x: dco.decompress(x) # will return HAMLET_SCENE, in several pieces dco.unused_data # only one character, the last of (c in x)! This is a bug IMO: unused_data should accumulate all the extra uncompressed data. |
|
|
msg174057 - (view) |
Author: Nadeem Vawda (nadeem.vawda) *  |
Date: 2012-10-28 16:26 |
This bug (zlib not providing a way to detect end-of-stream) has already been fixed - see issue 12646. I've opened issue 16350 for the unused_data problem. |
|
|