Issue 5604: imp.find_module() mixes UTF8 and MBCS (original) (raw)

Issue5604

Created on 2009-03-30 14:26 by gvanrossum, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
import.zip	asvetlov,2009-03-30 19:48	patch for support non-ascii characters in module names. With unittest
import.zip	asvetlov,2009-03-30 20:42
import.zip	asvetlov,2009-03-31 19:07	updated patch for fixing error messages in generated exception
import_patch_4th_edition.zip	asvetlov,2009-04-05 04:58	DONT APPLY IT. Works only for windows now.

Messages (15)
msg84548 - (view)	Author: Guido van Rossum (gvanrossum) *	Date: 2009-03-30 14:26
There's a path in imp.find_module that mixes encodings. The module name is encoded to char* using UTF-8 by the 's' format passed to PyArg_ParseTuple(). But the path name is converted (in the loop over the path in find_module()) to char* using the filesystem encoding. On Windows this ends up constructing a char* that mixes MBCS and UTF8 in one string. (We discovered this when researching bug 4352, but this is not the cause of the problem reported there -- the module name in that bug is ASCII.) Andrew Svetlov is looking into producing a patch.
msg84626 - (view)	Author: Andrew Svetlov (asvetlov) *	Date: 2009-03-30 19:48
Problem fixed, patch attached I inserted conversion path parameters to using Py_FileSystemDefaultEncoding for: * load_module * load_compiled * load_dynamic * load_source * load_package find_module is already has conversion.
msg84634 - (view)	Author: Hirokazu Yamamoto (ocean-city) *	Date: 2009-03-30 20:13
PyMem_Free is needed when "es" is used with PyArg_ParseTuple. See other part of import.c. I did same mistake before. ;-)
msg84636 - (view)	Author: Andrew Svetlov (asvetlov) *	Date: 2009-03-30 20:20
Thank you. On Mon, Mar 30, 2009 at 3:13 PM, Hirokazu Yamamoto <report@bugs.python.org> wrote: > > Hirokazu Yamamoto <ocean-city@m2.ccsnet.ne.jp> added the comment: > > PyMem_Free is needed when "es" is used with PyArg_ParseTuple. See other > part of import.c. I did same mistake before. ;-) > > ---------- > nosy: +ocean-city > > _______________________________________ > Python tracker <report@bugs.python.org> > <http://bugs.python.org/issue5604> > _______________________________________ >
msg84644 - (view)	Author: Andrew Svetlov (asvetlov) *	Date: 2009-03-30 20:42
According to Hirokazu Yamamoto memory cleanup added. Patch is updated.
msg84663 - (view)	Author: Guido van Rossum (gvanrossum) *	Date: 2009-03-30 21:49
Thanks Andrew! Committed to 3.0.2 as 70756. Should be merged into 3.1, but should not be backported to 2.x.
msg84862 - (view)	Author: Andrew Svetlov (asvetlov) *	Date: 2009-03-31 19:07
Continuing work over import.c I fixed bad error message encoding in generated ImportError exception. Tests for checking in case of non-ascii characters added.
msg84865 - (view)	Author: Andrew Svetlov (asvetlov) *	Date: 2009-03-31 19:09
Martin von Loewis added to nosy list
msg85220 - (view)	Author: Andrew Svetlov (asvetlov) *	Date: 2009-04-02 16:27
Martin, can you review latest patch and apply it if this one is correct. I want to start working on conversion import.c to use unicode strings (we spoke about Tuesday) this weekend. It will be nice if I will have synchronized svn before making new changes.
msg85464 - (view)	Author: Andrew Svetlov (asvetlov) *	Date: 2009-04-05 04:58
Continuing work on problem I figured out: * on Windows it's impossible to convert filenames to file system encoding without and don't miss something. * Windows can work properly only with unicode (wchar_t) characters. * all other systems feels itself good using utf-8 (or another filesystem encoding). * it's very errorprone to change 'char' to 'PyUnicode'. To solve this problem I assume: * all char* in Python API is utf-8. * if there are need call to operation system api like fopen - call imp_fopen, this function will do need conversions. Inside import.c there are 4 calls: fopen, open_exclusive, unlink, stat. I want to write stubs for this calls. * also loaders for dynamic modules aka 'C extensions' have to expect utf-8 as pathname parameter, not 'filesystem encoded'. Patch for windows is applied (STILL NOT CONVERTED TO OTHER OS). But for Windows it works (regression tests passed). If this solution is applicable for 3.1 (as I know Cannon works on excellent importlib but this library will replace imp functionality only in 3.2) - I can continue switching. Unfortunately I cannot test py3k trunk on non-windows machines - but I can 'make all OS calls as expected' and wait for buildbot answer. Please review import_patch_4th_edition.zip and if I ran in wrong way - let me know.
msg99499 - (view)	Author: Florent Xicluna (flox) *	Date: 2010-02-18 12:09
Still an issue for some buildbot: http://www.python.org/dev/buildbot/all/builders/x86%20XP-4%203.x/builds/1487 http://www.python.org/dev/buildbot/all/builders/x86%20XP-4%203.x/builds/1491 It is loosely related with #7712, because now the tests are run in TEMP (on C:) instead of the build dir (on D:).
msg99503 - (view)	Author: Ezio Melotti (ezio.melotti) *	Date: 2010-02-18 13:23
Also the test has a few problems: 1) the keys of known_locales are lowercase, but locale_encoding = locale.getpreferredencoding() can return uppercase encodings (e.g. UTF-8); 2) this masks another error: the b'\xe4' is not a valid utf-8 byte and it can be decoded; 3) the test should be skipped properly if the preferred encoding is not among the ones of the known_locales dict; 4) the 'encoded_char' should be 'decoded_char'. It seems that in the failure linked by Florent, find_module tries to encode the filename with the wrong encoding and with error='replace' and the char at the end of 'test_imp_helper_' is converted to U+FFFD. If the file is created with the correct name (e.g. 'test_imp_helper_\xc0') and find_module tries to search the wrong name (i.e. 'test_imp_helper_\ufffd'), then the error is raised (but then cp1252 used by the terminal can't encode that char and the second exception is raised). Now the tests are run on C: and the filesystem encoding might be different, so it might not match anymore the encoding returned by locale.getpreferredencoding().
msg100505 - (view)	Author: Ezio Melotti (ezio.melotti) *	Date: 2010-03-05 23:45
I fixed all the things listed in the previous message in r78689, but that just enabled the test on several Linux buildbots and some started to fail too. In r78696 (and r78697) I tried to use sys.getfilesystemencoding() instead of locale.getpreferredencoding() and that turned at least the Windows buildbots green, but there are still a few linux bots that fail: http://www.python.org/dev/buildbot/builders/AMD64%20Ubuntu%20wide%203.x/builds/545/steps/test/logs/stdio http://www.python.org/dev/buildbot/builders/AMD64%20Ubuntu%203.x/builds/561/steps/test/logs/stdio http://www.python.org/dev/buildbot/builders/x86%20Ubuntu%203.x/builds/525/steps/test/logs/stdio http://www.python.org/dev/buildbot/builders/amd64%20gentoo%203.x/builds/513/steps/test/logs/stdio
msg100528 - (view)	Author: Florent Xicluna (flox) *	Date: 2010-03-06 14:50
Thanks for fixing this. Now Win7 buildbot is green on trunk.
msg100529 - (view)	Author: Ezio Melotti (ezio.melotti) *	Date: 2010-03-06 14:54
The Linux buildbots were running the tests using ./python ./Lib/test/regrtest.py instead of ./python -m test.regrtest and '' was missing from sys.path, so imp.find_module couldn't find the module. This is now fixed in r78711 and backported r78716. Thanks to Florent that noticed that those buildbots were using a different command to run the tests.

History
Date	User	Action	Args
2022-04-11 14:56:47	admin	set	github: 49854
2010-03-06 14:54:18	ezio.melotti	set	status: open -> closedassignee: ezio.melottiresolution: fixedmessages: +
2010-03-06 14:50:28	flox	set	messages: + stage: needs patch -> resolved
2010-03-05 23:45:40	ezio.melotti	set	nosy: + ncoghlanmessages: +
2010-02-18 14:45:26	flox	set	versions: + Python 3.2, - Python 3.0
2010-02-18 13:24:48	ezio.melotti	set	status: closed -> openresolution: fixed -> (no value)
2010-02-18 13:23:16	ezio.melotti	set	status: open -> closedresolution: fixedmessages: +
2010-02-18 12:09:46	flox	set	status: closed -> opennosy: + ezio.melotti, floxmessages: + keywords: + buildbotresolution: fixed -> (no value)
2009-04-05 04:59:09	asvetlov	set	files: + import_patch_4th_edition.zipnosy: + brett.cannonmessages: +
2009-04-02 16:27:54	asvetlov	set	messages: +
2009-03-31 19:09:28	asvetlov	set	nosy: + loewismessages: +
2009-03-31 19:07:14	asvetlov	set	files: + import.zipmessages: +
2009-03-30 21:49:01	gvanrossum	set	status: open -> closedresolution: fixedmessages: +
2009-03-30 20:42:48	asvetlov	set	files: + import.zipmessages: +
2009-03-30 20:20:42	asvetlov	set	messages: +
2009-03-30 20:13:02	ocean-city	set	nosy: + ocean-citymessages: +
2009-03-30 19:48:45	asvetlov	set	files: + import.zipmessages: +
2009-03-30 14:26:40	gvanrossum	create