[Python-Dev] My work on Python3 and non-ascii paths is done (original) (raw)

Victor Stinner victor.stinner at haypocalc.com
Tue Oct 19 03:53:34 CEST 2010


Hi,

Seven months after my first commit related to this issue, the full test suite of Python 3.2 pass with ASCII, ISO-8859-1 and UTF-8 locale encodings in a non- ascii source directory. It means that Python 3.2 now process correctly filenames in all modules, build scripts and other utilities, with any locale encoding.

General changes:

Changes of the Python API:

Changes of the C API:

Bugfixes:

I wrote also some tests and documentation.

The most difficult part was to debug Python initialization (Py_InitializeEx and calculate_path) and the import machinery (import.c, zipimport.c), because gdb does sometimes crash (for various reasons) and because the import machinery is fragile and difficult to understand.

A special thanks to Marc-Andre Lemburg, Martin v. Löwis, Antoine Pitrou and Amaury Forgeot d'Arc for their help, useful advices and code reviews!

-- Bonus: short story of PYTHONFSENCODING ---

In the middle of August, I created the PYTHONFSENCODING environment variable, as suggested by Marc-Andre Lemburg. Because of this variable and because Python used utf-8 until the filesystem encoding is known, I had to write ugly and fragile "redecode" functions to redecode all filenames of all objects (sys.path, sys.meta_path, sys.executable, sys.modules, all code objects, etc.).

Then I found 4 issues related to PYTHONFSENCODING, inconsistencies between the filesystem encoding and the locale encoding. It was not easy to decide how to fix these issues, but at the end, we choosed to drop PYTHONFSENCODING variable, use the locale encoding as the filesystem encoding, and always use utf-8 as the filesystem encoding on Mac OS X.

-- Victor Stinner http://www.haypocalc.com/



More information about the Python-Dev mailing list