Issue 10040: GZipFile failure on large files (original) (raw)

Created on 2010-10-07 01:57 by Robert.Rohde, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Messages (5)
msg118091 - (view) Author: Robert Rohde (Robert.Rohde) Date: 2010-10-07 01:57
I attempted to use GZipFile to process a 1.93 GB file that expands to 18.8 GB. This consistently produces the same corrupted output file that has approximately, but not exactly, the right output file size. I bypassed GZipFile by calling the 7-Zip executable to open the compressed file. This works correctly and consistently. I haven't tried to figure out how GZipFile works, but I assume that this failure is probably related to the very large size of the files I am working with. I've used GZipFile before on much smaller files with no apparent problems. I have no idea what precisely goes wrong, or how to fix it, but I felt it was important to note that GZipFile isn't working for at least some very large files.
msg118164 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2010-10-08 04:47
Since you mention 7-zip, does that mean you are seeing the problem on a Windows platform? If so, exactly which version of Windows and what kind of system? Also, unless someone recognizes this as a duplicate of an earlier issue, there may not be much action on it unless you can supply a test case to reproduce the problem.
msg118169 - (view) Author: Robert Rohde (Robert.Rohde) Date: 2010-10-08 07:52
It's Windows 7 Ultimate (64-bit) on a very high end system. I don't think it would be very practical to distribute a 2 GB test file. Though I might be able to get it to a couple people if someone wanted to really study the issue. Though if it is an integer overflow (or something like that), then I would suspect that GZipFile would show corruption most of the time once the files got large enough. For example, it might occur for all files expanding to larger than 2^32 bytes (4 GB). (That's just speculation, I haven't tested it except to note that it failed the very first time I tried to use a file this large.) Perhaps someone familiar with the code could look for places where integers might overflow?
msg118177 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-10-08 10:42
Can you show a snippet of the code (or descrive it in detail) that "processes" the GzipFile? Right now it's not obvious which operations you are doing.
msg199753 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2013-10-13 18:26
Closing due to lack of feedback.
History
Date User Action Args
2022-04-11 14:57:07 admin set github: 54249
2013-10-13 18:26:22 georg.brandl set status: pending -> closednosy: + georg.brandlmessages: + resolution: not a bug
2012-12-04 10:10:49 serhiy.storchaka set status: open -> pending
2012-12-02 22:56:50 serhiy.storchaka set nosy: + serhiy.storchaka
2010-10-08 10:42:26 pitrou set versions: + Python 3.1, Python 3.2nosy: + pitroumessages: + components: + Library (Lib), - Windowsstage: test needed ->
2010-10-08 07:52:18 Robert.Rohde set messages: +
2010-10-08 04:47:03 ned.deily set nosy: + ned.deilymessages: + components: + Windows, - Library (Lib)stage: test needed
2010-10-07 01:57:08 Robert.Rohde create