Issue 17153: tarfile extract fails when Unicode in pathname (original) (raw)

Created on 2013-02-07 16:43 by vinay.sajip, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Messages (7)

msg181631 - (view)

Author: Vinay Sajip (vinay.sajip) * (Python committer)

Date: 2013-02-07 16:43

The attached file failing.tar.gz contains a path with UTF-8-encoded Unicode. This causes extractall() to fail, but only when the destination path is Unicode. That's because it leads to a implicit str->unicode conversion using ASCII.

Test script:

import shutil, tarfile, tempfile

tf = tarfile.open('failing.tar.gz', 'r:gz') workdir = tempfile.mkdtemp() try: # N.B. ensure dest path is Unicode to trigger the failure tf.extractall(unicode(workdir)) finally: shutil.rmtree(workdir)

Result:

$ python untar.py Traceback (most recent call last): File "untar.py", line 8, in tf.extractall(unicode(workdir)) File "/usr/lib/python2.7/tarfile.py", line 2046, in extractall self.extract(tarinfo, path) File "/usr/lib/python2.7/tarfile.py", line 2083, in extract self._extract_member(tarinfo, os.path.join(path, tarinfo.name)) File "/usr/lib/python2.7/posixpath.py", line 71, in join path += '/' + b UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 44: ordinal not in range(128)

msg221135 - (view)

Author: Mark Lawrence (BreamoreBoy) *

Date: 2014-06-20 23:46

@Lars can we have a response on this issue please?

msg222553 - (view)

Author: Lars Gustäbel (lars.gustaebel) * (Python committer)

Date: 2014-07-08 10:40

IIRC, tarfile under 2.7 has never been explicitly unicode-safe, support for unicode objects is heterogeneous at best. The obvious work-around is to work exclusively with str objects.

What we can't do is to decode the utf-8 pathname from the archive to a unicode object, because we have no way to detect an archive's encoding. We can either emit a warning if the user passes a unicode object to extract() or we implicitly encode the passed unicode object using TarFile.encoding, so that the os.path.join() succeeds.

Unfortunately, I am not entirely sure if there was possibly a rationale behind the current behaviour of extract(). This needs more inspection.

msg272329 - (view)

Author: Vadim Markovtsev (Vadim Markovtsev2)

Date: 2016-08-10 12:50

So... The bug persists in 3.5 ad 3.6. It prevents from e.g. unpacking tarballs coming from GitHub repos with Unicode file names.

msg272330 - (view)

Author: Vadim Markovtsev (Vadim Markovtsev2)

Date: 2016-08-10 12:54

Relevant issue in pip: https://github.com/pypa/setuptools/issues/710

msg272370 - (view)

Author: Vinay Sajip (vinay.sajip) * (Python committer)

Date: 2016-08-10 20:01

Could you point to some suitable projects from GitHub whose tarballs fail on 3.5 / 3.6? My script in the first post, with the replacing of "unicode(...)" with "str(...)" and my original failing archive, works on Python 3.5 and 3.6 on Linux. Which platform have you seen failures on?

msg394828 - (view)

Author: Zackery Spytz (ZackerySpytz) * (Python triager)

Date: 2021-05-31 21:06

Python 2.7 is no longer supported, so I think this issue should be closed.

History

Date

User

Action

Args

2022-04-11 14:57:41

admin

set

github: 61355

2021-05-31 22:27:36

vinay.sajip

set

status: open -> closed
resolution: out of date
stage: resolved

2021-05-31 21:06:36

ZackerySpytz

set

nosy: + ZackerySpytz
messages: +

2016-08-11 15:17:31

BreamoreBoy

set

nosy: - BreamoreBoy

2016-08-10 20:01:27

vinay.sajip

set

messages: +

2016-08-10 12:54:14

Vadim Markovtsev2

set

messages: +

2016-08-10 12:50:55

Vadim Markovtsev2

set

nosy: + Vadim Markovtsev2
messages: +

2014-07-08 10:40:12

lars.gustaebel

set

messages: +

2014-06-20 23:46:17

BreamoreBoy

set

nosy: + BreamoreBoy
messages: +

2013-02-08 10:19:47

hynek

set

nosy: + hynek

2013-02-07 16:45:09

vinay.sajip

set

nosy: + lars.gustaebel

2013-02-07 16:44:07

vinay.sajip

set

files: + failing.tar.gz

2013-02-07 16:43:21

vinay.sajip

create