Issue 31557: tarfile: incorrectly treats regular file as directory (original) (raw)

Created on 2017-09-22 21:42 by Joe Tsai, last changed 2022-04-11 14:58 by admin.

Files
File name Uploaded Description Edit
test.tar Joe Tsai,2017-10-03 22:07
Messages (5)
msg302778 - (view) Author: Joe Tsai (Joe Tsai) Date: 2017-09-22 21:42
The original V7 header only allocates 100B to store the file path. If a path exceeds this length, then either the PAX format or GNU formats must be used, which can represent arbitrarily long file paths. When doing so, most tar writers just store the first 100B of the file path in the V7 header. When reading, a proper reader should disregard the contents of the V7 field if a previous and corresponding PAX or GNU header overrode it. This currently not the case with the tarfile module, which has the following check (https://github.com/python/cpython/blob/c7cc14a825ec156c76329f65bed0d0bd6e03d035/Lib/tarfile.py#L1054-L1057): # Old V7 tar format represents a directory as a regular # file with a trailing slash. if obj.type == AREGTYPE and obj.name.endswith("/"): obj.type = DIRTYPE This check should be further constrained to only activate when there were no prior PAX or GNU records that override that value of obj.name. This check was the source of a bug that caused tarfile to report a regular as a directory because the file path was extra long, and when the tar write truncated the path to the first 100B, it so happened to end on a slash.
msg303431 - (view) Author: Nitish (nitishch) * Date: 2017-09-30 21:30
> This check was the source of a bug that caused tarfile to report a regular as a directory because the file path was extra long, and when the tar write truncated the path to the first 100B, it so happened to end on a slash. AFAIK, '/' character is not allowed as part of a filename on Linux systems. Is this bug platform specific? Can you give the testcase you are referring to.
msg303655 - (view) Author: Joe Tsai (Joe Tsai) Date: 2017-10-03 22:07
This bug is not platform specific. I've attached a reproduction: $ python >>> import tarfile >>> tarfile.open("test.tar", "r").next().isdir() True $ tar -tvf test.tar -rw-rw-r-- 0/0 0 1969-12-31 16:00 123456789/123456789/123456789/123456789/123456789/123456789/123456789/123456789/123456789/123456789/123456789/123456789/123456789/123456789/123456789/123456789/123456789/123456789/123456789/123456789/123456789/123456789/123456789/123456789/123456789/foo.txt $ tar --version tar (GNU tar) 1.27.1 For some background, this bug was original filed against the Go standard library (for which I am the maintainer of the Go implementation of tar). When I investigated the issue, I discovered that Go was doing the right thing, and that the discrepancy was due to the check I pointed to earlier. The GNU tool indicates that this is a regular file as well.
msg303676 - (view) Author: Nitish (nitishch) * Date: 2017-10-04 06:40
Try 'tar xvf test.tar'. On Linux machine at least, it is in fact producing a tree of directories. Not a single file. So - in a way what Python is reporting is correct.
msg303715 - (view) Author: Joe Tsai (Joe Tsai) Date: 2017-10-04 17:21
It creates a number of nested directories only because GNU (and BSD) tar implicitly create missing parent directories. If you cd into the bottom-most folder, you will see "foo.txt".
History
Date User Action Args
2022-04-11 14:58:52 admin set github: 75738
2017-10-04 17:21:19 Joe Tsai set messages: +
2017-10-04 06:40:06 nitishch set messages: +
2017-10-03 22:07:23 Joe Tsai set files: + test.tarmessages: +
2017-09-30 21:30:44 nitishch set nosy: + nitishchmessages: +
2017-09-30 07:00:40 serhiy.storchaka set versions: + Python 2.7, Python 3.6, Python 3.7nosy: + serhiy.storchakacomponents: + Library (Lib)type: behaviorstage: needs patch
2017-09-29 22:26:36 terry.reedy set nosy: + lars.gustaebel
2017-09-22 21:42:01 Joe Tsai create