Issue 4871: zipfile can't decrypt (original) (raw)

Created on 2009-01-07 20:26 by gladed, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
zipfile-pwd.patch amaury.forgeotdarc,2009-01-13 08:40 review
zipfile_no_unicode_pasword.patch vstinner,2009-01-13 19:29 review
zipfile_no_unicode_pasword-2.patch vstinner,2009-01-13 23:22 review
zipfile_no_unicode_pasword-2-py3k.patch vstinner,2009-01-13 23:23 review
Messages (15)
msg79368 - (view) Author: Glade Diviney (gladed) Date: 2009-01-07 20:26
If a password is supplied using zpifile.read(objName, password), a TypeError occurs: File "C:\Program Files\Python30\lib\zipfile.py", line 420, in __init__ self._UpdateKeys(p) File "C:\Program Files\Python30\lib\zipfile.py", line 424, in _UpdateKeys self.key0 = self._crc32(c, self.key0) File "C:\Program Files\Python30\lib\zipfile.py", line 413, in _crc32 return ((crc >> 8) & 0xffffff) ^ self.crctable[(crc ^ ch) & 0xff] TypeError: unsupported operand type(s) for ^: 'int' and 'str' This is resolved in zipfile.py by replacing this line in __init__: self._UpdateKeys(ord(p))
msg79388 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2009-01-08 01:03
The key is a bytes string or an unicode string? Using bytes, each element is an int instead of a str.
msg79421 - (view) Author: Glade Diviney (gladed) Date: 2009-01-08 16:19
In this case under test, the password string had been clipped from a filename, so I suppose it would have been Unicode.
msg79423 - (view) Author: Lukas Lueg (ebfe) Date: 2009-01-08 16:41
This is basically the same problem as with other bytes-orientated modules. The choice is either to reject unicode and force the caller to use .encode() on all his strings (so 'password' is an instance of bytes and 'ch' an instance of int). I'd prefer that. Or to check if 'password' is a unicode-object and encode to default within zipfile.
msg79425 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2009-01-08 16:50
> check if 'password' is a unicode-object > and encode to default within zipfile. Hmmm... what is the default charset? :-) I think that the charset depends on the OS and the user locale...
msg79426 - (view) Author: Lukas Lueg (ebfe) Date: 2009-01-08 16:59
The default encoding is UTF8. Even without that there should be no problems while staying in the same environment as the character->translation is the same. However it violates the rule of least surprise to see zipfile throwing CRC errors because the password is something like 'u?è´´n' and encoded by some locale-changing-nitwit in korea...
msg79719 - (view) Author: Gabriel Genellina (ggenellina) Date: 2009-01-13 05:34
Lukas Lueg> The default encoding is UTF8 What do you mean? Not inside a zip file. The default encoding is CP437 (the original IBM PC US character set). A zipfile password is a sequence of bytes, not characters, as defined in the specification. Probably this issue has to wait until it is decided what to do in the general case; anyway, encoding the password to bytes at the application level seems to be safer.
msg79724 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2009-01-13 08:40
in Lib/zipfile.py, the filename is already allowed to be unicode if the "flag_bits" for the file entry contains 0x800. This indicates that the encoding is utf-8. We could do the same for the password. Attached is a tentative patch along this idea (for py3k replace 'unicode' with 'str' of course)
msg79771 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2009-01-13 19:25
I created small archive with a password using zip 2.32 (Ubuntu Gutsy). The unicode flag is not set and so ASCII charset is used to encode the password. But my password was an unicode password using non-ASCII characters and so I get an UnicodeEncodeError ('ascii' codec can't encode character u'\xe9' ...). The user (me) doesn't expect such error here :-/ If I encode the unicode password to bytes using password.encode('utf-8'), it works as expected. So I would prefer to just reject unicode for the password. Attached patch implements that.
msg79784 - (view) Author: Gabriel Genellina (ggenellina) Date: 2009-01-13 22:45
Yes, the unicode flag is irrelevant to the password. To successfuly decrypt a file, one must know the password *and* the encoding in use when it was written, so the only sensible thing to do is to accept bytes only. Your patch looks good to me with a few comments: - I'd only check the password in setpassword() and open(); the other methods eventually call open(). - The issue version says "3.0"; but it should be applied to 2.6 2.7 and 3.1 too. - For 3.0 the check is wrong, should say "bytes" instead of "str". Even in 2.6 should use "bytes" to make clear the intent. - The usual wording for such TypeError is more like "pwd: expected 'bytes', got '%s'"
msg79791 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2009-01-13 23:22
gagenellina: Oops, yes, I wrote my patch for Python trunk. New patch only uses two checks for the password type (since extract() calls open()). I also changed the error message.
msg79793 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2009-01-13 23:23
Patch for py3k: str=>bytes, remove "u" string prefix and rename "bytes" variable to "data" to avoid conflict with the bytes type.
msg124322 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2010-12-18 21:43
What about bytearray? Apparently that works pre-patch for at least read, though setpassword rejects it via an assertion. Also, the error message should be "expected bytes" rather than "bytes expected". Don't ask me why, that's just the way it is normally done, so the other order sounds weird to this English speaker's ear. Otherwise the py3k patch looks good and tests correctly for me.
msg124460 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2010-12-21 21:36
Thinking about this some more, it seems like the chance that someone is using bytearray to pass a password to zipfile is vanishingly small, especially since in non-optimized mode setpassword would have rejected it. So I think that this should go in.
msg124463 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2010-12-21 22:03
Committed in r87430 (with message word order change), backported to 3.1 in r87431. Making the parallel change to 2.7 would be likely to break working code, IMO.
History
Date User Action Args
2022-04-11 14:56:43 admin set github: 49121
2010-12-21 22:03:13 r.david.murray set status: open -> closedversions: - Python 2.7nosy:amaury.forgeotdarc, ggenellina, vstinner, r.david.murray, ebfe, gladedmessages: + resolution: fixedstage: commit review -> resolved
2010-12-21 21:36:15 r.david.murray set assignee: r.david.murraymessages: + nosy:amaury.forgeotdarc, ggenellina, vstinner, r.david.murray, ebfe, gladedstage: commit review
2010-12-18 21:43:24 r.david.murray set nosy: + r.david.murraymessages: + versions: + Python 3.2, - Python 2.6, Python 3.0
2009-01-13 23:23:26 vstinner set files: + zipfile_no_unicode_pasword-2-py3k.patchmessages: +
2009-01-13 23:22:20 vstinner set files: + zipfile_no_unicode_pasword-2.patchmessages: + versions: + Python 2.6, Python 3.1, Python 2.7
2009-01-13 22:46:00 ggenellina set messages: +
2009-01-13 19:29:34 vstinner set files: + zipfile_no_unicode_pasword.patch
2009-01-13 19:29:20 vstinner set files: - zipfile_no_unicode_pasword.patch
2009-01-13 19:25:04 vstinner set files: + zipfile_no_unicode_pasword.patchmessages: +
2009-01-13 08:40:49 amaury.forgeotdarc set files: + zipfile-pwd.patchkeywords: + patchmessages: + nosy: + amaury.forgeotdarc
2009-01-13 05:34:43 ggenellina set nosy: + ggenellinamessages: +
2009-01-08 16:59:35 ebfe set messages: +
2009-01-08 16:50:48 vstinner set messages: +
2009-01-08 16:41:01 ebfe set nosy: + ebfemessages: +
2009-01-08 16:19:25 gladed set messages: +
2009-01-08 01:03:41 vstinner set nosy: + vstinnermessages: +
2009-01-07 20:26:25 gladed create