Issue 1734346: patch for bug 1170311 "zipfile UnicodeDecodeError" (original) (raw)

Created on 2007-06-10 10:53 by snaury, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
python-zipfile-unicode-filenames.patch	snaury,2007-06-10 10:53	Patch and test case
python-zipfile-unicode-filenames-utf8.patch	snaury,2007-06-10 20:29	Patch that sets language bit for unicode filenames
python-zipfile-unicode-filenames-utf8-2.patch	snaury,2007-06-11 04:22	Patch falls back to ascii when it can, ZipInfo filenames are not damaged after writing
python-zipfile-unicode-filenames-utf8-3.patch	snaury,2007-06-11 04:27	Forgot to add test case in the previous patch

Messages (10)
msg52744 - (view)	Author: Alexey Borzenkov (snaury)	Date: 2007-06-10 10:53
This patch fixes UnicodeDecodeError when attempting to write files to zipfile with filename of unicode class.
msg52745 - (view)	Author: Martin v. Löwis (loewis) *	Date: 2007-06-10 16:48
This patch is incorrect. It relies on the system encoding, and allows non-string things as file names. What it really should do is to encode in code page 437; bonus points if it falls back to the UTF-8 feature of zip files when that encoding fails.
msg52746 - (view)	Author: Alexey Borzenkov (snaury)	Date: 2007-06-10 20:29
File Added: python-zipfile-unicode-filenames-utf8.patch
msg52747 - (view)	Author: Alexey Borzenkov (snaury)	Date: 2007-06-11 04:22
File Added: python-zipfile-unicode-filenames-utf8-2.patch
msg52748 - (view)	Author: Alexey Borzenkov (snaury)	Date: 2007-06-11 04:27
File Added: python-zipfile-unicode-filenames-utf8-3.patch
msg65935 - (view)	Author: Christophe Kalt (kalt)	Date: 2008-04-28 21:32
Any chance of this making it in sometime? The current behaviour is rather limiting/annoying.
msg65939 - (view)	Author: Martin v. Löwis (loewis) *	Date: 2008-04-28 22:13
> Any chance of this making it in sometime? I'll see what I can do for 2.6, but perhaps it gets delayed until 2.7/3.1.
msg66274 - (view)	Author: Martin v. Löwis (loewis) *	Date: 2008-05-05 17:18
Thanks for the patch, committed as r62724. I didn't see the need to clear the UTF-8 flag, so I left it in (in case somebody wants to inspect it).
msg66277 - (view)	Author: Alexey Borzenkov (snaury)	Date: 2008-05-05 18:40
Martin, I cleared the flag bit because filename was changed in-place, to mark that filename does not need further processing. This was primarily compatibility concern, to accommodate for situations where users try to do such decoding in their own code (this way flag won't be there, so their code won't trigger). Without clearing the flag bit, calling _decodeFilenameFlags second time will fail, as well as any similar user code. I suggest that if users want to know if filename is unicode, they should check that filename is of class unicode.
msg66289 - (view)	Author: Martin v. Löwis (loewis) *	Date: 2008-05-05 21:15
> Martin, I cleared the flag bit because filename was changed in-place, to > mark that filename does not need further processing. This was primarily > compatibility concern, to accommodate for situations where users try to > do such decoding in their own code (this way flag won't be there, so > their code won't trigger). Without clearing the flag bit, calling > _decodeFilenameFlags second time will fail, as well as any similar user > code. I'm not concerned about the compatibility; code that actually does the decoding still might break since it would expect the filename to be a byte string if it doesn't explicitly decode. Such assumption would still break under your change. I am concerned about silently faking data. The library shouldn't do that; it should present the flags unmodified, as some application might perform further processing (such as displaying the flags to the user). It would then be confusing if the data processed isn't the one that was read from disk. > I suggest that if users want to know if filename is unicode, they should > check that filename is of class unicode. That won't work in Py3k, which will always decode the filename.

History
Date	User	Action	Args
2022-04-11 14:56:24	admin	set	github: 45077
2008-05-05 21:16:03	loewis	set	messages: +
2008-05-05 18:40:11	snaury	set	messages: +
2008-05-05 17🔞55	loewis	set	status: open -> closedresolution: acceptedmessages: +
2008-04-28 22:14:30	loewis	set	priority: normal -> high
2008-04-28 22:13:53	loewis	set	messages: +
2008-04-28 21:32:28	kalt	set	nosy: + kaltmessages: +
2007-09-10 20:34:45	loewis	set	assignee: loewisseverity: normal -> major
2007-06-10 10:53:22	snaury	create