Issue 4750: tarfile keeps excessive dir structure in compressed files (original) (raw)

Created on 2008-12-26 13:19 by techtonik, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
test_tarfile.extrapath.zip techtonik,2008-12-26 13:19 test tar.gz validness
4750.gzip.basename.fix.diff techtonik,2008-12-29 22:45 patch
python25.issue4750.diff techtonik,2008-12-30 07:20 python 2.5 patch
Messages (11)
msg78296 - (view) Author: anatoly techtonik (techtonik) Date: 2008-12-26 13:19
When tarfile is directed to create tar.gz compressed archive file in a path different from current, it saves full path information in .gz header where only filename is required. This causes problems with decompression utilities, such as 7zip. The testsuite with patch are attached. {{{ tar -czf dist\create_tar.tar.gz package 7z l dist\create_tar.tar.gz > tar.out python test_create.tar.gz.py 7z l dist\create_py.tar.gz > py.out diff -pu3 tar.out py.out }}} {{{ --- tar.out Fri Dec 26 15:12:42 2008 +++ py.out Fri Dec 26 15:12:42 2008 @@ -1,10 +1,10 @@ 7-Zip 4.57 Copyright (c) 1999-2007 Igor Pavlov 2007-12-06 -Listing archive: dist\create_tar.tar.gz +Listing archive: dist\create_py.tar.gz Date Time Attr Size Compressed Name ------------------- ----- ------------ ------------ ------------------------ -2008-12-26 15:12:41 10240 170 create_tar.tar +2008-12-26 15:03:39 10240 141 dist/create_py.tar ------------------- ----- ------------ ------------ ------------------------ - 10240 170 1 files, 0 folders + 10240 141 1 files, 0 folders }}} See also issue 1886 and in particular
msg78344 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2008-12-27 08:44
Lars, what do you think?
msg78372 - (view) Author: Lars Gustäbel (lars.gustaebel) * (Python committer) Date: 2008-12-27 17:55
Anatoly is right, the gzip file format specification (RFC 1952) says that the FNAME header field must be the basename of the original filename. So, this behaviour is not tarfile's fault but that of the gzip module and should be fixed there. 7zip can still decompress these files, right?
msg78414 - (view) Author: anatoly techtonik (techtonik) Date: 2008-12-28 15:46
7zip can decompress both, but it still creates "dist/" directory when decompressing file that is made with Python. I've noticed this bug with extra path component is actual with "tar" + "gzip" under windows. If they are executed separately and windows path with backslashes is used - directory prefix is not stripped. I.e. this creates archive with invalid header: {{{ tar -cf dist\create_tar_sep.tar package gzip -f9 dist\create_tar_sep.tar }}} This command is ok: {{{ tar -cf dist\create_tar_sep.tar package gzip -f9 dist/create_tar_sep.tar }}}
msg78449 - (view) Author: anatoly techtonik (techtonik) Date: 2008-12-29 12:01
For MSYS gzip added a bugreport here: https://sourceforge.net/tracker2/index.php?func=detail&aid=2474481&group_id=2435&atid=102435
msg78493 - (view) Author: anatoly techtonik (techtonik) Date: 2008-12-29 22:45
I attach patch for Python 2.6 gzip I clarified the meaning of self.name to be the basename corresponding to FNAME field in GZIP file header. There is a trace of deprecated gzip.filename API - I haven't found any references to it in documentation, so I removed it. In Python 2.5 it seemed to mean just filename in read mode and filename + .gz in write mode even if opened filename did not end with .gz If FNAME field from gzip header is ignored in read mode, so we want to make self.filename or self.name available via API - we need to agree what it should be - basename of archived file or path filename of archive itself.
msg78510 - (view) Author: anatoly techtonik (techtonik) Date: 2008-12-30 07:20
I attach for Python 2.5 as well. People will use gzip module for a long time to build packages and patch will help them to get correct archives.
msg78515 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2008-12-30 09:07
No further bug fixes are accepted for 2.5 (unless they fix security problems), so I reject the 2.5 patch.
msg94603 - (view) Author: Tarek Ziadé (tarek) * (Python committer) Date: 2009-10-28 06:54
Lars, is this still accurate ?
msg94648 - (view) Author: Lars Gustäbel (lars.gustaebel) * (Python committer) Date: 2009-10-29 08:52
The latest patch (4750.gzip.basename.fix.diff) cannot be used the way it is. The problem is that it uses the name attribute to store the basename with the .gz extension stripped. This breaks compatibility.
msg94651 - (view) Author: Lars Gustäbel (lars.gustaebel) * (Python committer) Date: 2009-10-29 09:43
I fixed it in r75935 and r75937.
History
Date User Action Args
2022-04-11 14:56:43 admin set github: 49000
2009-10-29 09:43:49 lars.gustaebel set status: open -> closedresolution: acceptedmessages: +
2009-10-29 08:52:04 lars.gustaebel set messages: +
2009-10-28 06:55:13 tarek set nosy:loewis, lars.gustaebel, techtonik, tarekcomponents: - Distutils
2009-10-28 06:54:58 tarek set nosy: + tarekmessages: + versions: + Python 3.1, Python 3.2, - Python 2.5
2008-12-30 09:07:19 loewis set messages: +
2008-12-30 09:06:38 loewis set files: - tarfile.directory.fix.diff
2008-12-30 07:20:20 techtonik set files: + python25.issue4750.diffmessages: +
2008-12-29 22:45:52 techtonik set files: + 4750.gzip.basename.fix.diffmessages: +
2008-12-29 12:01:53 techtonik set messages: +
2008-12-28 15:46:30 techtonik set messages: +
2008-12-27 17:55:45 lars.gustaebel set messages: +
2008-12-27 08:44:53 loewis set assignee: lars.gustaebelmessages: + nosy: + loewis, lars.gustaebel
2008-12-26 13:21:17 techtonik set files: + tarfile.directory.fix.diffkeywords: + patch
2008-12-26 13:19:59 techtonik create