[Python-Dev] Full unicode support for the import machinery (original) (raw)
Victor Stinner victor.stinner at haypocalc.com
Fri Jul 9 02:11:35 CEST 2010
- Previous message: [Python-Dev] New regex module for 3.2?
- Next message: [Python-Dev] Full unicode support for the import machinery
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Hi,
I'm trying to fix Python to support undecodable bytes in the Python path since some months ago. My first try was really huge and sometimes ugly. When it was possible, I extracted some short and simple patches and applied them to py3k (sometimes with an issue, sometimes directly in the svn).
When it was no more possible to split the big patch, I restarted the work from scratch. The main change from my previous try is that I changed import.c to use unicode strings instead of byte strings. With the surrogate hack (PEP 383), unicode is a superset of bytes and so it is "forward compatible".
I just created a branch called "import_unicode" (based on py3k) including all my patches. It's still a work in progress. It is possible to start Python installed in an undecodable path (eg. directory with an non-ASCII character with C locale for Linux), which is an huge progress, but some tests are still failing.
The last biggest problem is that code object filenames are not reencoded after that the file system encoding is changed (but sys.path and sys.modules filenames are reencoded). I think that I will register all code objects into a list to be able to reencode their filename attribute (and then drop the list).
I created an svn branch because I think that it's easier to review short commits than one unique huge patch. The branch also helps me to share the branch between different computers, and allow other people to review the commits (and/or contribute!).
Some people will maybe understand better my work with the "whole picture" :-)
--
There are at least 4 issues related to this work:
#3080: Full unicode import system #4352: imp.find_module() fails with a UnicodeDecodeError when called with non-ASCII search paths #8611: Python3 doesn't support locale different than utf8 and an non-ASCII path (POSIX) #8988: import + coding = failure (3.1.2/win32)
--
Some examples of previous issues related to my secret goal (patch import machinery):
#8391: os.execvpe() doesn't support surrogates in env #8393: subprocess: support undecodable current working directory on POSIX OS #8412: os.system() doesn't support surrogates nor bytes #8485: Don't accept bytearray as filenames, or simplify the API # 8514: Add fsencode() functions to os module #8610: Python3/POSIX: errors if file system encoding is None (-> create initfsencoding() in pythonrun.c) #8715: Create PyUnicode_EncodeFSDefault() function ...
-- Victor Stinner http://www.haypocalc.com/
- Previous message: [Python-Dev] New regex module for 3.2?
- Next message: [Python-Dev] Full unicode support for the import machinery
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]