Issue 17656: Python 2.7.4 breaks ZipFile extraction of zip files with unicode member paths (original) (raw)

process

Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: serhiy.storchaka Nosy List: Arfrever, Vhati, amaury.forgeotdarc, benjamin.peterson, catalin.iacob, christian.heimes, ezio.melotti, georg.brandl, gregory.p.smith, koobs, larry, loewis, ned.deily, neologix, pitrou, python-dev, r.david.murray, schmir, serhiy.storchaka, terry.reedy, twb, vstinner
Priority: release blocker Keywords: patch

Created on 2013-04-08 03:20 by Vhati, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
Kestrel Cruiser.zip Vhati,2013-04-08 04:40
zipfile_extract_unicode.patch serhiy.storchaka,2013-04-08 10:03 review
test_extract_unicode_filenames_skip.patch serhiy.storchaka,2013-04-20 21:18 Skip test_extract_unicode_filenames review
Messages (24)
msg186264 - (view) Author: Vhati (Vhati) Date: 2013-04-08 03:20
Python 2.7.4 fails while extracting zip files when 'member' is a unicode path. --- Traceback (most recent call last): ... my_zip.extract(item, tmp_folder_path) File "D:\Apps\Python274\lib\zipfile.py", line 1024, in extract return self._extract_member(member, path, pwd) File "D:\Apps\Python274\lib\zipfile.py", line 1057, in _extract_member arcname = arcname.translate(table) TypeError: character mapping must return integer, None or unicode --- 2.7.3 had no problems because the call to translate() is new. The following, copied from ZipFile.py, will recreate the error. -- import string illegal = ':<>|"?*' table = string.maketrans(illegal, '_' * len(illegal)) arcname = "hi" arcname = arcname.translate(table) # ascii strings are fine arcname = u"hi" arcname = arcname.translate(table) # unicode fails # Traceback (most recent call last): # File "", line 1, in # TypeError: character mapping must return integer, None or unicode --- I tried using unicode literals for the illegal string and maketrans underscore arg, but that didn't work. Suggestions? Here's a link to the doc for translate(). http://docs.python.org/2/library/stdtypes.html#str.translate
msg186265 - (view) Author: Vhati (Vhati) Date: 2013-04-08 03:37
Apparently namelist() can return either ascii or unicode strings for its members, depending on the archive. Obviously this'd apply to literal unicode strings as well.
msg186273 - (view) Author: Vhati (Vhati) Date: 2013-04-08 04:40
Oops, passing a unicode literal to extract()'s member arg wouldn't be sufficient. The extract() method quietly converts strings to ZipInfo objects via getinfo(member_string). Then _extract_member() takes the filename attribute of that ZipInfo object, which causes problems when when THAT is unicode. So I guess this bug only applies to archives with unicode member paths. Attached is one such file to aid in troubleshooting.
msg186275 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2013-04-08 04:46
It appears that this is a consequence of the changes in issue 6972, in particular change 4d1948689ee1.
msg186285 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-04-08 10:03
Yes, it's my fault. Here is a patch (with test) which fixes this regression in 2.7. This is 2.7 only issue, in Python 3 arcnames always are unicode. Please test on Windows.
msg186326 - (view) Author: Vhati (Vhati) Date: 2013-04-08 18:32
The 2013-04-08 patch worked on Windows XP.
msg186444 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-04-09 19:01
Perhaps this would deserve a 2.7.5?
msg186490 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2013-04-10 13:11
Yes; I won't have time for a few days, though.
msg186493 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2013-04-10 13:16
I guess I will join with 3.2 and 3.3 for #17666.
msg186494 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2013-04-10 13:18
Perhaps we should hold off for a week or two to see if any other critical problems show up.
msg186496 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2013-04-10 14:04
Yes, although the new releases will get the standard rc period anyway.
msg186660 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2013-04-12 18:23
A week's notice to push any almost ready IDLE bugfixes before the .rc's would be nice. (I am assuming there are some, but would have to ask Roger.)
msg186703 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2013-04-13 09:29
New changeset d02507c9f973 by Serhiy Storchaka in branch '2.7': Issue #17656: Fix extraction of zip files with unicode member paths. http://hg.python.org/cpython/rev/d02507c9f973
msg187430 - (view) Author: Kubilay Kocak (koobs) (Python triager) Date: 2013-04-20 14:38
heads-up: Tests are still failing on FreeBSD (gcc & clang) buildbots: http://buildbot.python.org/all/builders/AMD64%20FreeBSD%209.0%20dtrace%202.7/builds/472/steps/test/logs/stdio http://buildbot.python.org/all/builders/AMD64%20FreeBSD%209.0%20dtrace%2Bclang%202.7/builds/468/steps/test/logs/stdio
msg187453 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2013-04-20 20:05
it seems like file() can't handle unicode file names on FreeBSD. The FS encoding is 'US-ASCII' on Snakebite's FreeBSD box. > /home/cpython/users/christian.heimes/2.7/Lib/zipfile.py(1078)_extract_member() -> with self.open(member, pwd=pwd) as source, \ (Pdb) self.open(member, pwd=pwd) <zipfile.ZipExtFile object at 0x801eb5fd0> (Pdb) n > /home/cpython/users/christian.heimes/2.7/Lib/zipfile.py(1079)_extract_member() -> file(targetpath, "wb") as target: (Pdb) file(targetpath, "wb") *** UnicodeEncodeError: 'ascii' codec can't encode characters in position 47-48: ordinal not in range(128) (Pdb) sys.getfilesystemencoding() 'US-ASCII'
msg187461 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-04-20 21:18
Here is a patch which skips test_extract_unicode_filenames if no Unicode filesystem semantics on this platform.
msg187474 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2013-04-20 22:37
I guess that test_extract_unicode_filenames_skip.patch will not fix the failing test. The test fails because u"\xf6.txt" cannot be encoded to sys.getfilesystemencoding() (which is ASCII on the FreeBSD buildbot). You should test u"\xf6.txt". You should move the try/except inside the function.
msg188583 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2013-05-06 20:19
The test is still failling: http://buildbot.python.org/all/builders/AMD64 OpenIndiana 2.7/builds/1670/steps/test/logs/stdio """ ====================================================================== ERROR: test_extract_unicode_filenames (test.test_zipfile.TestsWithSourceFile) ---------------------------------------------------------------------- Traceback (most recent call last): File "/export/home/buildbot/64bits/2.7.cea-indiana-amd64/build/Lib/test/test_zipfile.py", line 436, in test_extract_unicode_filenames writtenfile = zipfp.extract(fname) File "/export/home/buildbot/64bits/2.7.cea-indiana-amd64/build/Lib/zipfile.py", line 1024, in extract return self._extract_member(member, path, pwd) File "/export/home/buildbot/64bits/2.7.cea-indiana-amd64/build/Lib/zipfile.py", line 1079, in _extract_member file(targetpath, "wb") as target: UnicodeEncodeError: 'ascii' codec can't encode characters in position 85-86: ordinal not in range(128) """
msg188730 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2013-05-08 18:53
New changeset 8952fa2c475f by Serhiy Storchaka in branch '2.7': Issue #17656: Skip test_extract_unicode_filenames if the FS encoding http://hg.python.org/cpython/rev/8952fa2c475f
msg188731 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-05-08 18:54
Sorry, I thought I had corrected this test.
msg188767 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-05-09 12:18
Shouldn't it left opened until regression fix release has released.
msg188769 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-05-09 12:25
I don't think so. The bug is fixed, and the fix will be in the release.
msg188779 - (view) Author: Arfrever Frehtes Taifersar Arahesis (Arfrever) * (Python triager) Date: 2013-05-09 14:38
http://mail.python.org/pipermail/python-dev/2013-April/125761.html asked to leave bugs open.
msg188780 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-05-09 15:10
Ah, fair enough.
History
Date User Action Args
2022-04-11 14:57:44 admin set github: 61856
2013-05-27 16:07:10 laci112 set components: + Windows, - Library (Lib), Unicode
2013-05-13 00:06:27 benjamin.peterson set status: open -> closed
2013-05-09 15:10:01 pitrou set status: closed -> openmessages: +
2013-05-09 14:38:35 Arfrever set messages: +
2013-05-09 12:25:52 pitrou set messages: +
2013-05-09 12🔞31 serhiy.storchaka set messages: +
2013-05-09 00:34:05 pitrou set status: open -> closed
2013-05-08 20:40:20 serhiy.storchaka set resolution: fixed
2013-05-08 18:54:39 serhiy.storchaka set messages: +
2013-05-08 18:53:24 python-dev set messages: +
2013-05-06 20:19:40 neologix set nosy: + neologixmessages: +
2013-04-30 19:05:15 serhiy.storchaka set assignee: serhiy.storchaka
2013-04-20 22:37:58 vstinner set messages: +
2013-04-20 21🔞05 serhiy.storchaka set files: + test_extract_unicode_filenames_skip.patchmessages: +
2013-04-20 20:05:23 christian.heimes set messages: +
2013-04-20 19:26:41 serhiy.storchaka set nosy: + vstinner
2013-04-20 14:38:57 koobs set nosy: + koobsmessages: +
2013-04-13 16:47:24 serhiy.storchaka set stage: patch review -> resolved
2013-04-13 09:29:03 python-dev set messages: +
2013-04-12 18:23:41 terry.reedy set nosy: + terry.reedymessages: +
2013-04-10 16:44:24 gregory.p.smith set priority: high -> release blocker
2013-04-10 14:04:01 georg.brandl set messages: +
2013-04-10 13🔞52 ned.deily set messages: +
2013-04-10 13:16:59 georg.brandl set messages: +
2013-04-10 13:11:14 benjamin.peterson set messages: +
2013-04-09 19:01:10 pitrou set nosy: + pitroumessages: +
2013-04-09 11:54:00 christian.heimes set nosy: + christian.heimes
2013-04-08 18:32:16 Vhati set messages: +
2013-04-08 10:03:01 serhiy.storchaka set files: + zipfile_extract_unicode.patchpriority: normal -> highcomponents: + Library (Lib)versions: - Python 3.2, Python 3.3, Python 3.4keywords: + patchtype: crash -> behaviormessages: + stage: patch review
2013-04-08 04:59:13 gregory.p.smith set title: Python 2.7.4 Breaks ZipFile Extraction -> Python 2.7.4 breaks ZipFile extraction of zip files with unicode member pathsversions: + Python 3.2, Python 3.3, Python 3.4
2013-04-08 04:46:01 loewis set nosy: + loewis, georg.brandl, gregory.p.smith, amaury.forgeotdarc, larry, schmir, benjamin.peterson, ned.deily, Arfrever, r.david.murray, twb, catalin.iacob, python-dev, serhiy.storchakamessages: +
2013-04-08 04:40:49 Vhati set files: + Kestrel Cruiser.zipmessages: +
2013-04-08 03:37:23 Vhati set messages: +
2013-04-08 03:20:59 Vhati create