[Python-Dev] zipfile and unicode filenames (original) (raw)

Alexey Borzenkov snaury at gmail.com
Sat Jun 9 22:23:20 CEST 2007


Hi everyone,

Today I've stumbled upon a bug in my program that wasn't very straightforward to understand. The problem is that I was passing unicode filenames to zipfile.ZipFile.write and I had sys.setdefaultencoding() in effect, which resulted in a situation where most of the bytes generated in zipfile.ZipInfo.FileHeader would pass thru, except for a few, which caused codec error on another machine (where filenames got infectiously upgraded to unicode). The problem here is that it was absolutely unclear at first that I get unicode filenames passed to write, and it incorrectly accepted them silently. Is it worth to submit a bug report on this? The desired behavior here would be to either a) disallow unicode strings as arcname are raise an exception (since it is used in concatenation with raw data it is likely to cause problems because of auto upgrading raw data to unicode), or b) silently encode unicode strings to raw strings (something like if isinstance(filename, unicode): filename = filename.encode() in zipfile.ZipInfo constructor).

So, should I submit a bug report, and which behavior would be actually correct?



More information about the Python-Dev mailing list