msg234786 - (view) |
Author: Swapneel Ambre (amswap) * |
Date: 2015-01-26 22:43 |
On Windows, using zipimport module APIs like get_filename on a file with non-ascii characters in the full path fails with UnicodeEncodeError: 'mbcs' codec can't encode characters in position 0--1: invalid character ( Full output attached in errorlog.txt ). The issue is that Modules/zipimport.c has a function compile_source which tries to run PyUnicode_EncodeFSDefault on the pathname. On Windows, the default encoding is 'mbcs' which cannot handle unicode characters. This has already been fixed in the import machinery on python 3 ( see issue http://bugs.python.org/issue13758, http://bugs.python.org/issue11619). The solution is to pass the pathname as Unicode directly to the compiler. |
|
|
msg234787 - (view) |
Author: Swapneel Ambre (amswap) * |
Date: 2015-01-26 22:45 |
I am attaching the test script I have used to reproduce the issue. |
|
|
msg234789 - (view) |
Author: Swapneel Ambre (amswap) * |
Date: 2015-01-26 22:49 |
I have tried to fix this by calling Py_CompileStringObject instead of Py_CompileString , thus avoiding the need to Encode the pathname. Please see zipimport_fix.patch for the possible fix. |
|
|
msg234790 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2015-01-26 22:52 |
> Please see zipimport_fix.patch for the possible fix. The solution looks good. Can you please try to convert zipimport_test.py to a patch for test_zipimport.py and combine it with zipimport_fix.patch to create a complete patch? You should also sign the contributor agreement: https://www.python.org/psf/contrib/contrib-form/ |
|
|
msg234792 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2015-01-26 22:55 |
I don't understand the issue: does it only concern the name of the ZIP file? Or also paths inside the ZIP? In both cases, the workaround is to use only ASCII names. I spent a lot of times on supporting any Unicode name, everyone in Python. I didn't expect that people have so different and crazy use cases :-) |
|
|
msg234794 - (view) |
Author: Swapneel Ambre (amswap) * |
Date: 2015-01-26 23:12 |
Sorry I was not very clear about the use case. The name of the zipfile or any parent directory name could contain non-ascii characters. Consider a use case where you want to ship some product with third party module shipped as an egg file (say example.egg) along with your product. You don't have control over where the product files gets installed. Someone could install the product files under say C:\的\product_name. So both your product (exe or python files) and the egg files are installed under a path with non-ascii characters in it. Any import statements trying to import modules from the egg file will fail with UnicodeEncodeError as zipimport will try to use PyUnicode_EncodeFSDefault with 'mbcs' encoding on Windows. I hope the use case is clearer now. I do agree that it is a corner case scenario and using ASCII names is a better option :-) I will create a complete patch and sign contributor agreement. |
|
|
msg234798 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2015-01-26 23:40 |
> I do agree that it is a corner case scenario and using ASCII names is a better option :-) Since the patch is short, I see no problem to fix this issue. |
|
|
msg234840 - (view) |
Author: Swapneel Ambre (amswap) * |
Date: 2015-01-27 19:08 |
Attaching a combined patch. I updated testUnencodable testcase from test_zipimport.py. Verified that without my fix, the testcase fails and it passes with my fix. |
|
|
msg325711 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2018-09-19 06:55 |
Thank you for your patch Swapneel, but this issue was fixed in . |
|
|