Issue 9720: zipfile writes incorrect local file header for large files in zip64 (original) (raw)

process

Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: serhiy.storchaka Nosy List: Kristof.Keppens, Nico.Möller, Paul, Ruben.Gonzalez, alanmcintyre, amaury.forgeotdarc, christian.heimes, craigds, dandrzejewski, enlavin, eric.araujo, gregory.p.smith, jhenry82, lambacck, loewis, nadeem.vawda, python-dev, ronaldoussoren, segfault42, serhiy.storchaka
Priority: normal Keywords: needs review, patch

Created on 2010-08-31 01:02 by craigds, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
zipfile_zip64_header.patch craigds,2010-08-31 01:02
zipfile-huge-files.diff alanmcintyre,2010-09-07 04:57 review
zipfile_zip64_always.patch serhiy.storchaka,2012-09-23 09:54 Always write Zip64 extra review
zipfile_zip64_try.patch serhiy.storchaka,2012-09-23 09:55 Try to write Zip64 extra only if needed review
zipfile_zip64_always_2.patch serhiy.storchaka,2012-11-28 12:25 Always write Zip64 extra review
zipfile_zip64_try_2.patch serhiy.storchaka,2012-11-28 12:26 Try to write Zip64 extra only if needed review
zipfile_zip64_try_2-2.7.patch serhiy.storchaka,2013-01-04 13:27 review
zipfile_zip64_try_2-3.2.patch serhiy.storchaka,2013-01-04 13:27 review
Messages (20)
msg115250 - (view) Author: Craig de Stigter (craigds) Date: 2010-08-31 01:02
Steps to reproduce: # create a large (>4gb) file f = open('foo.txt', 'wb') text = 'a' * 1024**2 for i in xrange(5 * 1024): f.write(text) f.close() # now zip the file import zipfile z = zipfile.ZipFile('foo.zip', mode='w', allowZip64=True) z.write('foo.txt') z.close() Now inspect the file headers using a hex editor. The written headers are incorrect. The filesize and compressed size should be written as 0xffffffff and the 'extra field' should contain the actual sizes. Tested on Python 2.5 but looking at the latest code in 3.2 it still looks broken. The problem is that the ZipInfo.FileHeader() is written before the filesize is populated, so Zip64 extensions are not written. Later, the sizes in the header are written, but Zip64 extensions are not taken into account and the filesize is just wrapped (7gb becomes 3gb, for instance). My patch fixes the problem on Python 2.5, it might need minor porting to fix trunk. It works by assigning the uncompressed filesize to the ZipInfo header initially, then writing the header. Then later on, I re-write the header (this is okay since the header size will not have increased.)
msg115466 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2010-09-03 16:53
A tip about versions: Development happens on the current active branch, py3k (future 3.2 version), and bug or doc fixes are backported to the stable versions 2.7 and 3.1. Security fixes go into 2.6 too. Can you reproduce your bug in 2.7, 3.1 and 3.2? Adding Alan to nosy since he’s listed in Misc/maintainers.rst.
msg115514 - (view) Author: Craig de Stigter (craigds) Date: 2010-09-03 21:47
Yes, the bug still exists in Python 3.1.2. However, struct.pack() no longer silently ignores overflow, so I get this error instead: >>> z.write('foo.txt') Traceback (most recent call last): File "", line 1, in File "/usr/lib/python3.1/zipfile.py", line 1095, in write zinfo.file_size)) struct.error: argument out of range
msg115660 - (view) Author: Alan McIntyre (alanmcintyre) * (Python committer) Date: 2010-09-05 17:42
Thanks for the patch, Craig; I should have some time later today or tomorrow to do a review. Did you have a patch for the test suite(s) as well? If not, I can just make sure your test case is covered in test_zipfile64.
msg115672 - (view) Author: Craig de Stigter (craigds) Date: 2010-09-05 21:16
Hi, sorry no I haven't had time to add a real test for this
msg115741 - (view) Author: Alan McIntyre (alanmcintyre) * (Python committer) Date: 2010-09-07 04:57
Here's an updated patch for the py3k trunk with tests. This pretty much doubles the runtime of test_zipfile64.py. The patch also removes some unnecessary code from the existing test_zipfile64 tests. Note: It looks like writestr will also suffer from a struct.pack overflow if it's given a ZipInfo with the third general purpose flag bit set. I won't have time to address that until next weekend, probably.
msg146923 - (view) Author: Nadeem Vawda (nadeem.vawda) * (Python committer) Date: 2011-11-03 12:17
Issue 6434 was marked as a duplicate of this issue.
msg156442 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-03-20 17:52
I am afraid that the problem is more complicated. With the option allowZip64=True all files need to write with this extension, because size of local file header may change and there will be after compression just go back and rewrite it. Now it appears that the Zip64 option simply does not work.
msg170645 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2012-09-18 13:44
Serhiy: If I understand you correctly it should be easy to fix. The code in close() has to check if any file is beyond the ZIP64 limit and then write all headers with extra args. Is that correct?
msg171010 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-09-22 17:56
No, on the contrary, it is not such easy to fix, and the patch is incorrect. Sorry that it is not clear either. The size of the header with extra args depends on the size of the file. The file size can be changed in the process of compressing, and compressed size may be larger than uncompressed size, exceeding 32-bit boundary. Rewriting the header with extra args, we can overwrite compressed data. I was put off the issue for further more careful research. Thanks for the reminder. One solution is always (even for smallest files) to write 64-bit sizes when allowZip64 is true.
msg171025 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-09-23 09:54
I see two rational solutions of the issue (all written below is applicable only for allowZip64=True): 1) Always write Zip64 extended information extra field. This approach always successful, but the zipfile size will increase by 20 bytes for each file. The first patch (zipfile_zip64_always.patch) uses this approach. 2) Write Zip64 extended information extra field only if assumed file size is more than a certain limit. In very rare cases this leads to the impossibility of compression of the file which can be compressed the first way. However it produces the same file as before patch in most cases. The second patch (zipfile_zip64_try.patch) is based on Alan's patch and uses the second approach. The probability of errors is reduced and they are now detected and does not lead to a silent data damage. Both patches are for Python 3.3. If any patch is good, I'll backport it for the older versions.
msg172648 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-10-11 15:08
What the conclusion about the patches? Which variant I should backport for older versions?
msg172652 - (view) Author: Ronald Oussoren (ronaldoussoren) * (Python committer) Date: 2012-10-11 15:22
I'd write the extended header when the current file size is larger than the zip64 limit (that is, when 'st.st_size > ZIP64_LIMIT' in the write method. That way the minimal header size is used whenever possible. As you noted this can cause problems when the file grows beyond the limit while it is stored in the zipfile, but IMHO storing data while it is modified is asking for problems anyway. BTW. I haven't actually review the patch yet.
msg175471 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-11-12 20:38
Please, review the patches.
msg176538 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-11-28 12:26
Patches updated to resolve merge conflict with . Please review and apply any of this patches. This is needed for some other my zipfile patches.
msg178603 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-12-30 19:11
What variant of patches should I commit? Or prepare other?
msg179013 - (view) Author: Nico Möller (Nico.Möller) Date: 2013-01-04 10:21
I most definitely need a patch for 2.7.3 Would be awesome if you could provide a patch for that version.
msg179019 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-01-04 13:27
Here are second variant patches for 2.7 and 3.2.
msg179987 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2013-01-14 22:45
New changeset ce869b05762c by Serhiy Storchaka in branch '2.7': Issue #9720: zipfile now writes correct local headers for files larger than 4 GiB. http://hg.python.org/cpython/rev/ce869b05762c New changeset b93848ca7760 by Serhiy Storchaka in branch '3.2': Issue #9720: zipfile now writes correct local headers for files larger than 4 GiB. http://hg.python.org/cpython/rev/b93848ca7760 New changeset 656a45738e5e by Serhiy Storchaka in branch '3.3': Issue #9720: zipfile now writes correct local headers for files larger than 4 GiB. http://hg.python.org/cpython/rev/656a45738e5e New changeset 628a6af64a46 by Serhiy Storchaka in branch 'default': Issue #9720: zipfile now writes correct local headers for files larger than 4 GiB. http://hg.python.org/cpython/rev/628a6af64a46
msg179989 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-01-14 22:49
Fixed. Thank you for report, Craig de Stigter.
History
Date User Action Args
2022-04-11 14:57:05 admin set github: 53929
2013-01-14 22:49:08 serhiy.storchaka set status: open -> closedresolution: fixedmessages: + stage: patch review -> resolved
2013-01-14 22:45:09 python-dev set nosy: + python-devmessages: +
2013-01-04 13:27:38 serhiy.storchaka set files: + zipfile_zip64_try_2-2.7.patch, zipfile_zip64_try_2-3.2.patchmessages: +
2013-01-04 10:21:58 Nico.Möller set nosy: + Nico.Möllermessages: +
2012-12-30 19:11:38 serhiy.storchaka set messages: +
2012-12-29 22:08:10 serhiy.storchaka set assignee: serhiy.storchaka
2012-11-28 12:26:01 serhiy.storchaka set files: + zipfile_zip64_always_2.patch, zipfile_zip64_try_2.patchmessages: +
2012-11-26 20:32:14 jhenry82 set nosy: + jhenry82
2012-11-12 20:38:26 serhiy.storchaka set messages: +
2012-10-19 08:54:37 Ruben.Gonzalez set nosy: + Ruben.Gonzalez
2012-10-11 15:22:27 ronaldoussoren set messages: +
2012-10-11 15:08:29 serhiy.storchaka set messages: + versions: + Python 3.4
2012-09-23 09:55:46 serhiy.storchaka set files: + zipfile_zip64_try.patchstage: needs patch -> patch review
2012-09-23 09:54:19 serhiy.storchaka set files: + zipfile_zip64_always.patchnosy: + loewis, gregory.p.smith, ronaldoussorenmessages: +
2012-09-22 17:56:23 serhiy.storchaka set messages: +
2012-09-18 13:44:30 christian.heimes set keywords: + needs reviewnosy: + christian.heimesmessages: +
2012-09-18 13:25:53 Kristof.Keppens set nosy: + Kristof.Keppens
2012-03-20 17:52:08 serhiy.storchaka set messages: +
2012-03-20 17:13:23 serhiy.storchaka set nosy: + serhiy.storchaka
2012-03-20 14:35:55 dandrzejewski set nosy: + dandrzejewski
2011-11-03 12:17:55 nadeem.vawda set versions: + Python 3.3, - Python 3.1nosy: + amaury.forgeotdarc, nadeem.vawda, lambacck, segfault42, enlavin, Paulmessages: + stage: needs patch
2011-11-03 12:17:17 nadeem.vawda link issue6434 superseder
2010-09-07 04:57:46 alanmcintyre set files: + zipfile-huge-files.diffmessages: +
2010-09-05 21:16:38 craigds set messages: +
2010-09-05 17:42:17 alanmcintyre set messages: +
2010-09-03 21:47:12 craigds set messages: +
2010-09-03 16:53:46 eric.araujo set nosy: + eric.araujo, alanmcintyremessages: + versions: - Python 2.6, Python 2.5, Python 3.3
2010-08-31 01:02:17 craigds create