[Python-Dev] Import and unicode: part two (original) (raw)

Stephen J. Turnbull stephen at xemacs.org
Wed Jan 26 09:58:36 CET 2011


Toshio Kuratomi writes:

Sure ... but with these systems, neither read-modules-as-locale or read-modules-as-utf-8 are a good solution to work, correct?

Good solution, no, but I believe that read-modules-as-locale should work to a great extent. AFAIK Python 3 reads Python programs as str (ie, converting to Unicode -- if it doesn't, it should).

Especially if the OS does get upgraded but the filesystems with user data (and user created modules) are migrated as-is, you'll run into situations where system installed modules are in utf-8 and user created modules are shift-jis and so something will always be broken.

I don't know what you mean by "system-installed modules". If you're talking about Python itself, it's not a problem. Python doesn't have any Japanese-named modules in any encoding.

On the other hand, everything that involves scripting (shell scripts, make, etc) related to those filesystems will be broken unless the system, after upgrade but before going live, is converted to have an appropriate locale encoding. So I don't really see a problem here.

The problem is portability across systems, and that is a problem that only the third-party transports can really deal with. tar and unzip need to be taught how to change file names to the locale, etc.

The only way to make sure that modules work is to restrict them to ASCII-only on the filesystem. But because unicode module names are seen as a necessary feature, the question is which way forward is going to lead to the least brokenness. Which could be locale... but from the python2 locale-related bugs that I get to look at, I doubt.

AFAICS this is going to be site-specific. End of story. Or, if you prefer, "maru-nage".

IMHO, Python 2 locale bugs are unlikely to be a good guide to Python 3 locale bugs because in Python 2 most people just ignore locale and use "native" strings (~= bytes in Python 3), and that typically "just works". In Python 3 that just doesn't work any more because you get a UnicodeError on import, etc, etc.

IMHO, YMMV, and all that. I know of such systems (there remain quite a few here used by student and research labs), but the ones I maintain were easy to convert to UTF-8 because I don't export file systems (except my private files for my own use); everything is mediated by Apache and Zope, and browsers are happy to cope if I change from EUC-JP to UTF-8 and then flip the Apache switch to change default encodings.



More information about the Python-Dev mailing list