Issue 22468: Tarfile using fstat on GZip file object (original) (raw)

Created on 2014-09-23 08:49 by bartolsthoorn, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Messages (6)

msg227328 - (view)

Author: Bart Olsthoorn (bartolsthoorn)

Date: 2014-09-23 08:49

CPython tarfile gettarinfo method uses fstat to determine the size of a file (using its fileobject). When that file object is actually created with Gzip.open (so a GZipfile), it will get the compressed size of the file. The addfile method will then continue to read the uncompressed data of the gzipped file, but will read too few bytes, resulting in a tar of incomplete files.

I suggest checking the file object class before using fstat to determine the size, and raise a warning if it's a gzip file.

To clarify, this only happens when adding a GZip file object to tar. I know that it's not a really common scenario, and the problem is really that GZip file size can only properly be determined by uncompressing and reading it entirely, but I think it's nice to not fail without warning.

So this is an example that is failing:

import tarfile
c = io.BytesIO()
with tarfile.open(mode='w', fileobj=c) as tar:
  for textfile in ['1.txt.gz', '2.txt.gz']:
    with gzip.open(textfile) as f:
      tarinfo = tar.gettarinfo(fileobj=f)
      tar.addfile(tarinfo=tarinfo, fileobj=f)
  data = c.getvalue()
return data

Instead this reads the proper filesize and writes the files to a tar:

import tarfile
c = io.BytesIO()
with tarfile.open(mode='w', fileobj=c) as tar:
  for textfile in ['1.txt.gz', '2.txt.gz']:
    with gzip.open(textfile) as f:
      buff = f.read()
      tarinfo = tarfile.TarInfo(name=f.name)
      tarinfo.size = len(buff)
      tar.addfile(tarinfo=tarinfo, fileobj=io.BytesIO(buff))
  data = c.getvalue()
return data

msg238961 - (view)

Author: Mark Lawrence (BreamoreBoy) *

Date: 2015-03-23 00:12

states "it's not a really common scenario" but I believe we must still allow for it, what do others think?

msg238967 - (view)

Author: Martin Panter (martin.panter) * (Python committer)

Date: 2015-03-23 01:15

I think a warning in the documentation might be helpful.

However a special check in the code doesn’t seem right. Would you check for LZMAFile and BZ2File as well? Some of the other attributes (modification time, owner, etc) may be useful even for a GzipFile, and the programmer can just overwrite the file size attribute if necessary.

msg241582 - (view)

Author: Martin Panter (martin.panter) * (Python committer)

Date: 2015-04-20 01:25

I am posting a documentation patch which I hope should clarify that objects like GzipFile won’t work automatically with gettarinfo(). It also has other modifications to address Issue 21996 (name must be text) and help with Issue 22208 (clarify non-OS files won’t work).

msg260537 - (view)

Author: Roundup Robot (python-dev) (Python triager)

Date: 2016-02-20 00:18

New changeset 94a94deaf06a by Martin Panter in branch '3.5': Issues #22468, #21996, #22208: Clarify gettarinfo() and TarInfo usage https://hg.python.org/cpython/rev/94a94deaf06a

New changeset e66c476b25ec by Martin Panter in branch 'default': Issue #22468: Merge gettarinfo() doc from 3.5 https://hg.python.org/cpython/rev/e66c476b25ec

New changeset 9d5217aaea13 by Martin Panter in branch '2.7': Issues #22468, #21996, #22208: Clarify gettarinfo() and TarInfo usage https://hg.python.org/cpython/rev/9d5217aaea13

msg260541 - (view)

Author: Martin Panter (martin.panter) * (Python committer)

Date: 2016-02-20 00:26

Hoping my clarification in the documentation is enough to call this fixed

History

Date

User

Action

Args

2022-04-11 14:58:08

admin

set

github: 66658

2016-02-20 00:26:53

martin.panter

set

status: open -> closed
versions: + Python 2.7, Python 3.6, - Python 3.4
messages: +

resolution: fixed
stage: patch review -> resolved

2016-02-20 00🔞57

python-dev

set

nosy: + python-dev
messages: +

2016-02-09 23:04:35

martin.panter

link

issue21996 dependencies

2015-04-20 01:25:10

martin.panter

set

files: + gettarinfo.patch

assignee: docs@python
components: + Documentation
versions: + Python 3.5
keywords: + patch
nosy: + docs@python

messages: +
stage: patch review

2015-03-23 01:15:34

martin.panter

set

nosy: + martin.panter
messages: +

2015-03-23 00:12:57

BreamoreBoy

set

nosy: + BreamoreBoy
messages: +

2014-09-23 18:22:37

ned.deily

set

nosy: + lars.gustaebel

2014-09-23 08:49:52

bartolsthoorn

create