Issue 618135: gzip.py and files > 2G (original) (raw)

Created on 2002-10-03 16:16 by geertj, last changed 2022-04-10 16:05 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
python-gzip.diff geertj,2002-10-04 07:36
Messages (6)
msg41318 - (view) Author: Geert Jansen (geertj) * Date: 2002-10-03 16:16
Problem: Currently, the gzip module is not able to work with files > 2G uncompressed. The source of the problem is that at the end of a .gz file, there is a trailer containing a 32 bit length field. This field is of course unable to represent a file length > 4G. Because of mixed type arithmetic in gzip.py, this limit is lowered to 2G. Testcase: python gzip.py # must be > 2G python gzip.py -d <file.gz> # error Proposed fix: Test the uncompressed data size modulo 4G. A patch implementing this fix is attached. This is also the solution that gzip itself uses. Two other remarks: I don't understand lines 22-23 of gzip.py: why is the test: "if value < 0" necessary when writing an unsigned int? The testing of the crc value in GzipFile._read_eof() is done modulo 4G. Is this necessary? crc32 is just read from the file as a normal int, and self.crc is from zlib.crc which always returns a regular int. Regards, Geert Jansen
msg41319 - (view) Author: Geert Jansen (geertj) * Date: 2002-10-04 07:36
Logged In: YES user_id=537938 Sorry -- it seems the file upload went wrong! Second try.
msg41320 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2002-11-04 17:08
Logged In: YES user_id=31435 Assigned to me. I think your suggested fix makes good sense.
msg41321 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2002-11-04 19:51
Logged In: YES user_id=31435 Fixed, by related changes in Lib/gzip.py; new revision: 1.36 Misc/NEWS; new revision: 1.508
msg41322 - (view) Author: Geert Jansen (geertj) * Date: 2002-11-05 10:36
Logged In: YES user_id=537938 I'm afraid this doesn't fix the whole problem. You fixed the problem for file sizes in the range 2G-4G, but (if I read your patch correctly), files >4G still don't work. On Linux it is very easy to create files > 4G and Python supports this, so it would be nice to have. A better fix IMHO would be to test the file size modulo 4G. The probability that an invalid gzip files becomes valid by this less accurate test is astronomically small (there is also a CRC). In fact, this is also the fix that the "official" gzip program uses. I can give you a test account on my Linux machine if you want to test a patch and don't have a machine with large file support nearby . Or I can test a patch for you.
msg41323 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2002-11-05 20:40
Logged In: YES user_id=31435 Got it. It's distasteful but pragmatic . Fixed again, in Lib/gzip.py; new revision: 1.37 Misc/NEWS; new revision: 1.510 It was tested "by hand" on Win2K (on a 6+GB file).
History
Date User Action Args
2022-04-10 16:05:43 admin set github: 37257
2002-10-03 16:16:48 geertj create