Issue 10757: zipfile.write, arcname should be allowed to be a byte string (original) (raw)

Created on 2010-12-22 12:44 by connexion2000, last changed 2022-04-11 14:57 by admin.

Messages (7)
msg124499 - (view) Author: Jacek Jabłoński (connexion2000) Date: 2010-12-22 12:44
file = 'somefile.dat' filename = "ółśąśółąś.dat" zip = zipfile.ZipFile('archive.zip', 'w', zipfile.ZIP_DEFLATED) zip.write(file, filename) above produces very nasty filename in zip archive. ************************************************************* file = 'somefile.dat' filename = "ółśąśółąś.dat" zip = zipfile.ZipFile('archive.zip', 'w', zipfile.ZIP_DEFLATED) zip.write(file, filename.encode('cp852')) this produces TypeError: expected an object with the buffer interface Documentation says that: There is no official file name encoding for ZIP files. If you have unicode file names, you must convert them to byte strings in your desired encoding before passing them to write(). I convert them to byte string but it ends with an error. If it is documentation bug, what is the proper way to have filenames like "ółśąśółąś" in zip archive?
msg124518 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2010-12-22 20:07
This is not a bug. Your code that produces "very nasty filename" is the right one - the file name is actually the one you asked for. The second code is also behaving correctly: filename already *is* a bytestring, calling .encode for it is meaningless.
msg124519 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2010-12-22 20:12
Oops, I take this back - I didn't notice you were using Python 3.1. In any case, your first code is correct. What you get is the best you can ask for. That the second case fails is indeed a bug.
msg124641 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2010-12-25 16:37
See also of issue 4871. From looking at the code it appears that the filename must be a string, and if it contains only ASCII characters it is entered as ascii, while if it contains non-ascii it is encoded to utf-8 and the appropriate flag bits set in the archive to indicate this (I know nothing about the archive format, by the way, I'm just looking at the code). So, in reverse of issue 4871, it appears that in this case the API should reject bytes input with an appropriate error message.
msg124686 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2010-12-26 23:54
> So, in reverse of issue 4871, it appears that in this case the API should reject bytes input with an appropriate error message. -1. It is quite common to produce ill-formed zipfiles, and other ziptools are interpreting them in violation of the format spec. Python needs to support creation of such broken zipfiles, even though it may not be able to read them back.
msg124690 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2010-12-27 00:45
Well, this is the same treat-strings-and-byte-strings-equivalently-in-the-same-API problem that we've had elsewhere. It'll require a bit of refactoring to make it work. On read zipfile decodes filenames using cp437 if the utf-8 flag isn't set. Logically, then, a binary string should be encoded using cp437. Since cp437 has a character corresponding to each of the 256 bytes, it seems to me it should be enough to decode a binary filename using cp437 and set a flag that _encodeFilenameFlags would respect and re-encode to cp437 instead of utf-8. That might produce unexpected results if someone passes in a binary filename encoded in some other character set, but it would be consistent with how zipfiles work and so should be at least as interoperable as zipfiles normally are.
msg257385 - (view) Author: Patrik Dufresne (Patrik Dufresne) Date: 2016-01-02 23:23
This bug is very old, any development on the subject. This issue is hitting me trying to port my project (rdiffweb) to python3. It received a lot of broken filename with invalid encoding and I need to create a meaningful Zip archive with it. Currently, it just fail because zipfile doesn't accept arcname as bytes.
History
Date User Action Args
2022-04-11 14:57:10 admin set github: 54966
2016-01-02 23:23:30 Patrik Dufresne set nosy: + Patrik Dufresnemessages: +
2015-07-21 07:19:00 ethan.furman set nosy: - ethan.furman
2015-04-13 21:25:50 ozialien set nosy: + ozialien
2013-10-14 22:39:46 ethan.furman set nosy: + ethan.furman
2010-12-27 00:45:06 r.david.murray set nosy:loewis, aimacintyre, r.david.murray, connexion2000messages: + title: zipfile.write, arcname should be bytestring -> zipfile.write, arcname should be allowed to be a byte string
2010-12-26 23:54:25 loewis set nosy:loewis, aimacintyre, r.david.murray, connexion2000messages: +
2010-12-25 16:37:05 r.david.murray set nosy: + r.david.murraymessages: +
2010-12-24 21:54:48 terry.reedy set nosy: + aimacintyrestage: test neededtype: compile error -> behaviorversions: + Python 3.2
2010-12-22 20:12:05 loewis set status: closed -> openmessages: + resolution: not a bug ->
2010-12-22 20:07:48 loewis set status: open -> closednosy: + loewismessages: + resolution: not a bug
2010-12-22 12:44:03 connexion2000 create