Issue 1446489: zipfile: support for ZIP64 (original) (raw)

Created on 2006-03-09 14:58 by ronaldoussoren, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
zipfile-zip64.patch ronaldoussoren,2006-03-09 15:28
zipfile-zip64-version2.patch ronaldoussoren,2006-05-23 13:10
zipfile64-version3.patch ronaldoussoren,2006-05-26 08:26
zipfile64-version4.patch ronaldoussoren,2006-05-30 13:28
zipfile64-version-5.patch ronaldoussoren,2006-06-11 21:09
Messages (12)
msg49695 - (view) Author: Ronald Oussoren (ronaldoussoren) * (Python committer) Date: 2006-03-09 14:58
The attached patch implements support for ZIP64, that is zipfiles containing very large (>4GByte) files and zipfiles that are larger than 4GByte themselves. The output of this patch can be read by pkzip (see below for the actual version I used for testing).
msg49696 - (view) Author: Ronald Oussoren (ronaldoussoren) * (Python committer) Date: 2006-03-09 15:28
Logged In: YES user_id=580910 Oops, I've uploaded the wrong file. zipfile-zip64.patch is the correct one. I've tested the correctness of created archives using this version of pkzip: pkzipc -version PKZIP(R) Server Version 8 ZIP Compression Utility for Linux X86 Copyright (C) 1989-2005 PKWARE, Inc. All Rights Reserved. Evaluation Version PKZIP Reg. U.S. Pat. and Tm. Off. Patent No. 5,051,745 Patent Pending Version 8.40.66
msg49697 - (view) Author: Anthony Baxter (anthonybaxter) (Python triager) Date: 2006-04-02 05:02
Logged In: YES user_id=29957 I'd like to see a testcase and possibly a note for the documentation about the new semantics. Also, should it be possible to say "don't use the ZIP64 extension, instead raise an Error" for people who don't want to generate these?
msg49698 - (view) Author: Ronald Oussoren (ronaldoussoren) * (Python committer) Date: 2006-04-02 19:13
Logged In: YES user_id=580910 The "don't use the ZIP64 extension" flag is a good idea, zipfiles that use this extension aren't readable by the infozip tools (zip and unzip on most unix systems). I'll add tests and documentation in the near future. The version of zipfile that I'm currently using also contains a patch for speeding up the opening of zipfiles, for the type of files I'm dealing with (about 11GByte large with tens of thousands of files) the speedup is very significant. I suppose it's better to file that as a separate patch after this has been approved.
msg49699 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2006-05-16 07:41
Logged In: YES user_id=849994 Since 2.5 beta is coming close, have you made progress on the tests/docs?
msg49700 - (view) Author: Ronald Oussoren (ronaldoussoren) * (Python committer) Date: 2006-05-16 07:55
Logged In: YES user_id=580910 I haven't had time to work on this, all time I had to work on python related stuff has been eaten by finishing PyObjC's port to intel macs and universal binary patches. The former is now done, the latter almost so I'll have some time to work on this again especially because I'm using this patch at work and might be able to claim some time to work on this during work-hours.
msg49701 - (view) Author: Ronald Oussoren (ronaldoussoren) * (Python committer) Date: 2006-05-23 13:10
Logged In: YES user_id=580910 I've found some time to work on this. I've added zipfile-zip64- version2.patch, this version: * Makes zip64 behaviour optional (defaults to off because zip(1) doesn't support zip64) * Is significantly faster for large zipfiles because it doesn't scan the entire zipfile just to check that the file headers are consistent with the central directory w.r.t. filename (this check is now done when trying to read a file) * Updates the reference documentation. * Adds unittests. There are two sets of tests: one set tests the behaviour of zip64 extensions using small files by lowering the zip64 cutoff point and is run every time, the other set do tests with huge zipfiles and are run when the largefile feature is enabled when running the tests. There one backward incompatible change: ZipInfo objects no longer have a file_offset attribute. That was the other reason for scanning the entire zipfile when opening it. IMNSHO this should have been a private attribute and the cost of this feature is not worth its *very* limited usefulness. As an indication of its cost: I got a 6x speedup when I removed the calculation of the file_offset attribute, something that adds up when you are dealing with huge zipfiles (I wrote this patch because I'm dealing with 10+GByte zipfiles with tens of thousands of files at work). I noticed that zipfile raises RuntimeError in some places. I've changed one of those to zipfile.BadZipfile, but others remain. I don't like this, most of them should be replaced by TypeError or ValueError exceptions. BTW. This patch also supports storing files >4GByte in the zipfile, but that feature isn't very useful because zipfile doesn't have an API for reading file data incrementally.
msg49702 - (view) Author: Ronald Oussoren (ronaldoussoren) * (Python committer) Date: 2006-05-26 08:26
Logged In: YES user_id=580910 I've attached yet another version, this version reintroduces some functionalitity that was unintentionally removed and fixes a lame bug that caused test_zipimport to fail.
msg49703 - (view) Author: Ronald Oussoren (ronaldoussoren) * (Python committer) Date: 2006-05-30 13:28
Logged In: YES user_id=580910 I've added some more tests for pre-existing functionality. The unittests are still far from comprehensive, but at least touch upon most functionality of zipfile. Does anyone feel like reviewing this? I'd like to get this into python2.5.
msg49704 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2006-06-11 20:33
Logged In: YES user_id=413 reading zipfile64-version64.patch: * why does the zipfile module import itself? * Why is the default ZIP64 limit 1 << 30? shouldn't that be 1 << 31 - 1 (or slightly less) for maximum compatibility on existing <2GiB zip files or zips with data just under 2GiB. Don't force zip64's use unless the size actually exceeds a 32bit signed integer. * assert diskno == 0 and assert nodisks == 1 should be turned into BadZipFile exceptions with an explanation that multi-disk zip files aren't supported. * in main() document the -t option in the usage string. * TestZip64InSmallFiles changes zipfile.ZIP64_LIMIT but will not restore the value if a test fails (that could lead to other unrelated test failures). not a problem in the hopefully normal case of all tests passing. use a try: finally: to make sure that gets reset. * documentation: "Is does optionally handle" is awkward. how about "It can handle" The removal of the file_offset attribute makes sense but does make me wonder how much existing code that could break. I suggest leaving file_offset out and if any python 2.5 beta tester complains, restoring it or making scanning to look file offsets up a ZipFile option (defaulting to True).
msg49705 - (view) Author: Ronald Oussoren (ronaldoussoren) * (Python committer) Date: 2006-06-11 21:09
Logged In: YES user_id=580910 * The import of zipfile itself is a bug * The limit should indead be raised to (1<<31-1). * the diskno and nodisks assertions are present in the current version of zipfiles, but I agree that those should be changed into exceptions. * I've updated main to document and actually allow the -t option * TestZip64InSmallFiles restores the ZIP64_LIMIT in the tearDown method, isn't that good enough? I sure hope that nobody actually uses the file_offset. The only usecase I can think of for that is to reimplement the read method. If it turns out that this change does break existing code we could add yet another option, but lets wait with that until someone actually complains. I've uploaded a new version of the patch that fixes all these issues. BTW. Thanks for the review.
msg49706 - (view) Author: Ronald Oussoren (ronaldoussoren) * (Python committer) Date: 2006-07-18 12:37
Logged In: YES user_id=580910 this is part of 2.5, no need to keep this item open.
History
Date User Action Args
2022-04-11 14:56:15 admin set github: 43003
2006-03-09 14:58:19 ronaldoussoren create