Problem: Currently, the gzip module is not able to work with files > 2G uncompressed. The source of the problem is that at the end of a .gz file, there is a trailer containing a 32 bit length field. This field is of course unable to represent a file length > 4G. Because of mixed type arithmetic in gzip.py, this limit is lowered to 2G. Testcase: python gzip.py # must be > 2G python gzip.py -d <file.gz> # error Proposed fix: Test the uncompressed data size modulo 4G. A patch implementing this fix is attached. This is also the solution that gzip itself uses. Two other remarks: I don't understand lines 22-23 of gzip.py: why is the test: "if value < 0" necessary when writing an unsigned int? The testing of the crc value in GzipFile._read_eof() is done modulo 4G. Is this necessary? crc32 is just read from the file as a normal int, and self.crc is from zlib.crc which always returns a regular int. Regards, Geert Jansen
Logged In: YES user_id=537938 I'm afraid this doesn't fix the whole problem. You fixed the problem for file sizes in the range 2G-4G, but (if I read your patch correctly), files >4G still don't work. On Linux it is very easy to create files > 4G and Python supports this, so it would be nice to have. A better fix IMHO would be to test the file size modulo 4G. The probability that an invalid gzip files becomes valid by this less accurate test is astronomically small (there is also a CRC). In fact, this is also the fix that the "official" gzip program uses. I can give you a test account on my Linux machine if you want to test a patch and don't have a machine with large file support nearby . Or I can test a patch for you.
Logged In: YES user_id=31435 Got it. It's distasteful but pragmatic . Fixed again, in Lib/gzip.py; new revision: 1.37 Misc/NEWS; new revision: 1.510 It was tested "by hand" on Win2K (on a 6+GB file).