Issue 5604: imp.find_module() mixes UTF8 and MBCS (original) (raw)

Issue5604

Created on 2009-03-30 14:26 by gvanrossum, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
import.zip asvetlov,2009-03-30 19:48 patch for support non-ascii characters in module names. With unittest
import.zip asvetlov,2009-03-30 20:42
import.zip asvetlov,2009-03-31 19:07 updated patch for fixing error messages in generated exception
import_patch_4th_edition.zip asvetlov,2009-04-05 04:58 DONT APPLY IT. Works only for windows now.
Messages (15)
msg84548 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2009-03-30 14:26
There's a path in imp.find_module that mixes encodings. The module name is encoded to char* using UTF-8 by the 's' format passed to PyArg_ParseTuple(). But the path name is converted (in the loop over the path in find_module()) to char* using the filesystem encoding. On Windows this ends up constructing a char* that mixes MBCS and UTF8 in one string. (We discovered this when researching bug 4352, but this is not the cause of the problem reported there -- the module name in that bug is ASCII.) Andrew Svetlov is looking into producing a patch.
msg84626 - (view) Author: Andrew Svetlov (asvetlov) * (Python committer) Date: 2009-03-30 19:48
Problem fixed, patch attached I inserted conversion path parameters to using Py_FileSystemDefaultEncoding for: * load_module * load_compiled * load_dynamic * load_source * load_package find_module is already has conversion.
msg84634 - (view) Author: Hirokazu Yamamoto (ocean-city) * (Python committer) Date: 2009-03-30 20:13
PyMem_Free is needed when "es" is used with PyArg_ParseTuple. See other part of import.c. I did same mistake before. ;-)
msg84636 - (view) Author: Andrew Svetlov (asvetlov) * (Python committer) Date: 2009-03-30 20:20
Thank you. On Mon, Mar 30, 2009 at 3:13 PM, Hirokazu Yamamoto <report@bugs.python.org> wrote: > > Hirokazu Yamamoto <ocean-city@m2.ccsnet.ne.jp> added the comment: > > PyMem_Free is needed when "es" is used with PyArg_ParseTuple. See other > part of import.c. I did same mistake before. ;-) > > ---------- > nosy: +ocean-city > > _______________________________________ > Python tracker <report@bugs.python.org> > <http://bugs.python.org/issue5604> > _______________________________________ >
msg84644 - (view) Author: Andrew Svetlov (asvetlov) * (Python committer) Date: 2009-03-30 20:42
According to Hirokazu Yamamoto memory cleanup added. Patch is updated.
msg84663 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2009-03-30 21:49
Thanks Andrew! Committed to 3.0.2 as 70756. Should be merged into 3.1, but should *not* be backported to 2.x.
msg84862 - (view) Author: Andrew Svetlov (asvetlov) * (Python committer) Date: 2009-03-31 19:07
Continuing work over import.c I fixed bad error message encoding in generated ImportError exception. Tests for checking in case of non-ascii characters added.
msg84865 - (view) Author: Andrew Svetlov (asvetlov) * (Python committer) Date: 2009-03-31 19:09
Martin von Loewis added to nosy list
msg85220 - (view) Author: Andrew Svetlov (asvetlov) * (Python committer) Date: 2009-04-02 16:27
Martin, can you review latest patch and apply it if this one is correct. I want to start working on conversion import.c to use unicode strings (we spoke about Tuesday) this weekend. It will be nice if I will have synchronized svn before making new changes.
msg85464 - (view) Author: Andrew Svetlov (asvetlov) * (Python committer) Date: 2009-04-05 04:58
Continuing work on problem I figured out: * on Windows it's impossible to convert filenames to file system encoding without and don't miss something. * Windows can work properly only with unicode (wchar_t) characters. * all other systems feels itself good using utf-8 (or another filesystem encoding). * it's very errorprone to change 'char*' to 'PyUnicode*'. To solve this problem I assume: * all char* in Python API is utf-8. * if there are need call to operation system api like fopen - call imp_fopen, this function will do need conversions. Inside import.c there are 4 calls: fopen, open_exclusive, unlink, stat. I want to write stubs for this calls. * also loaders for dynamic modules aka 'C extensions' have to expect utf-8 as pathname parameter, not 'filesystem encoded'. Patch for windows is applied (STILL NOT CONVERTED TO OTHER OS). But for Windows it works (regression tests passed). If this solution is applicable for 3.1 (as I know Cannon works on excellent importlib but this library will replace imp functionality only in 3.2) - I can continue switching. Unfortunately I cannot test py3k trunk on non-windows machines - but I can 'make all OS calls as expected' and wait for buildbot answer. Please review import_patch_4th_edition.zip and if I ran in wrong way - let me know.
msg99499 - (view) Author: Florent Xicluna (flox) * (Python committer) Date: 2010-02-18 12:09
Still an issue for some buildbot: http://www.python.org/dev/buildbot/all/builders/x86%20XP-4%203.x/builds/1487 http://www.python.org/dev/buildbot/all/builders/x86%20XP-4%203.x/builds/1491 It is loosely related with #7712, because now the tests are run in TEMP (on C:) instead of the build dir (on D:).
msg99503 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2010-02-18 13:23
Also the test has a few problems: 1) the keys of known_locales are lowercase, but locale_encoding = locale.getpreferredencoding() can return uppercase encodings (e.g. UTF-8); 2) this masks another error: the b'\xe4' is not a valid utf-8 byte and it can be decoded; 3) the test should be skipped properly if the preferred encoding is not among the ones of the known_locales dict; 4) the 'encoded_char' should be 'decoded_char'. It seems that in the failure linked by Florent, find_module tries to encode the filename with the wrong encoding and with error='replace' and the char at the end of 'test_imp_helper_' is converted to U+FFFD. If the file is created with the correct name (e.g. 'test_imp_helper_\xc0') and find_module tries to search the wrong name (i.e. 'test_imp_helper_\ufffd'), then the error is raised (but then cp1252 used by the terminal can't encode that char and the second exception is raised). Now the tests are run on C: and the filesystem encoding might be different, so it might not match anymore the encoding returned by locale.getpreferredencoding().
msg100505 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2010-03-05 23:45
I fixed all the things listed in the previous message in r78689, but that just enabled the test on several Linux buildbots and some started to fail too. In r78696 (and r78697) I tried to use sys.getfilesystemencoding() instead of locale.getpreferredencoding() and that turned at least the Windows buildbots green, but there are still a few linux bots that fail: http://www.python.org/dev/buildbot/builders/AMD64%20Ubuntu%20wide%203.x/builds/545/steps/test/logs/stdio http://www.python.org/dev/buildbot/builders/AMD64%20Ubuntu%203.x/builds/561/steps/test/logs/stdio http://www.python.org/dev/buildbot/builders/x86%20Ubuntu%203.x/builds/525/steps/test/logs/stdio http://www.python.org/dev/buildbot/builders/amd64%20gentoo%203.x/builds/513/steps/test/logs/stdio
msg100528 - (view) Author: Florent Xicluna (flox) * (Python committer) Date: 2010-03-06 14:50
Thanks for fixing this. Now Win7 buildbot is green on trunk.
msg100529 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2010-03-06 14:54
The Linux buildbots were running the tests using ./python ./Lib/test/regrtest.py instead of ./python -m test.regrtest and '' was missing from sys.path, so imp.find_module couldn't find the module. This is now fixed in r78711 and backported r78716. Thanks to Florent that noticed that those buildbots were using a different command to run the tests.
History
Date User Action Args
2022-04-11 14:56:47 admin set github: 49854
2010-03-06 14:54:18 ezio.melotti set status: open -> closedassignee: ezio.melottiresolution: fixedmessages: +
2010-03-06 14:50:28 flox set messages: + stage: needs patch -> resolved
2010-03-05 23:45:40 ezio.melotti set nosy: + ncoghlanmessages: +
2010-02-18 14:45:26 flox set versions: + Python 3.2, - Python 3.0
2010-02-18 13:24:48 ezio.melotti set status: closed -> openresolution: fixed -> (no value)
2010-02-18 13:23:16 ezio.melotti set status: open -> closedresolution: fixedmessages: +
2010-02-18 12:09:46 flox set status: closed -> opennosy: + ezio.melotti, floxmessages: + keywords: + buildbotresolution: fixed -> (no value)
2009-04-05 04:59:09 asvetlov set files: + import_patch_4th_edition.zipnosy: + brett.cannonmessages: +
2009-04-02 16:27:54 asvetlov set messages: +
2009-03-31 19:09:28 asvetlov set nosy: + loewismessages: +
2009-03-31 19:07:14 asvetlov set files: + import.zipmessages: +
2009-03-30 21:49:01 gvanrossum set status: open -> closedresolution: fixedmessages: +
2009-03-30 20:42:48 asvetlov set files: + import.zipmessages: +
2009-03-30 20:20:42 asvetlov set messages: +
2009-03-30 20:13:02 ocean-city set nosy: + ocean-citymessages: +
2009-03-30 19:48:45 asvetlov set files: + import.zipmessages: +
2009-03-30 14:26:40 gvanrossum create