Issue 1734346: patch for bug 1170311 "zipfile UnicodeDecodeError" (original) (raw)

Created on 2007-06-10 10:53 by snaury, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
python-zipfile-unicode-filenames.patch snaury,2007-06-10 10:53 Patch and test case
python-zipfile-unicode-filenames-utf8.patch snaury,2007-06-10 20:29 Patch that sets language bit for unicode filenames
python-zipfile-unicode-filenames-utf8-2.patch snaury,2007-06-11 04:22 Patch falls back to ascii when it can, ZipInfo filenames are not damaged after writing
python-zipfile-unicode-filenames-utf8-3.patch snaury,2007-06-11 04:27 Forgot to add test case in the previous patch
Messages (10)
msg52744 - (view) Author: Alexey Borzenkov (snaury) Date: 2007-06-10 10:53
This patch fixes UnicodeDecodeError when attempting to write files to zipfile with filename of unicode class.
msg52745 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2007-06-10 16:48
This patch is incorrect. It relies on the system encoding, and allows non-string things as file names. What it really should do is to encode in code page 437; bonus points if it falls back to the UTF-8 feature of zip files when that encoding fails.
msg52746 - (view) Author: Alexey Borzenkov (snaury) Date: 2007-06-10 20:29
File Added: python-zipfile-unicode-filenames-utf8.patch
msg52747 - (view) Author: Alexey Borzenkov (snaury) Date: 2007-06-11 04:22
File Added: python-zipfile-unicode-filenames-utf8-2.patch
msg52748 - (view) Author: Alexey Borzenkov (snaury) Date: 2007-06-11 04:27
File Added: python-zipfile-unicode-filenames-utf8-3.patch
msg65935 - (view) Author: Christophe Kalt (kalt) Date: 2008-04-28 21:32
Any chance of this making it in sometime? The current behaviour is rather limiting/annoying.
msg65939 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2008-04-28 22:13
> Any chance of this making it in sometime? I'll see what I can do for 2.6, but perhaps it gets delayed until 2.7/3.1.
msg66274 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2008-05-05 17:18
Thanks for the patch, committed as r62724. I didn't see the need to clear the UTF-8 flag, so I left it in (in case somebody wants to inspect it).
msg66277 - (view) Author: Alexey Borzenkov (snaury) Date: 2008-05-05 18:40
Martin, I cleared the flag bit because filename was changed in-place, to mark that filename does not need further processing. This was primarily compatibility concern, to accommodate for situations where users try to do such decoding in their own code (this way flag won't be there, so their code won't trigger). Without clearing the flag bit, calling _decodeFilenameFlags second time will fail, as well as any similar user code. I suggest that if users want to know if filename is unicode, they should check that filename is of class unicode.
msg66289 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2008-05-05 21:15
> Martin, I cleared the flag bit because filename was changed in-place, to > mark that filename does not need further processing. This was primarily > compatibility concern, to accommodate for situations where users try to > do such decoding in their own code (this way flag won't be there, so > their code won't trigger). Without clearing the flag bit, calling > _decodeFilenameFlags second time will fail, as well as any similar user > code. I'm not concerned about the compatibility; code that actually does the decoding still might break since it would expect the filename to be a byte string if it doesn't explicitly decode. Such assumption would still break under your change. I am concerned about silently faking data. The library shouldn't do that; it should present the flags unmodified, as some application might perform further processing (such as displaying the flags to the user). It would then be confusing if the data processed isn't the one that was read from disk. > I suggest that if users want to know if filename is unicode, they should > check that filename is of class unicode. That won't work in Py3k, which will always decode the filename.
History
Date User Action Args
2022-04-11 14:56:24 admin set github: 45077
2008-05-05 21:16:03 loewis set messages: +
2008-05-05 18:40:11 snaury set messages: +
2008-05-05 17🔞55 loewis set status: open -> closedresolution: acceptedmessages: +
2008-04-28 22:14:30 loewis set priority: normal -> high
2008-04-28 22:13:53 loewis set messages: +
2008-04-28 21:32:28 kalt set nosy: + kaltmessages: +
2007-09-10 20:34:45 loewis set assignee: loewisseverity: normal -> major
2007-06-10 10:53:22 snaury create