[Python-3000] Unicode strings, identifiers, and import (original) (raw)

Jean-Paul Calderone exarkun at divmod.com
Mon May 14 13:32:40 CEST 2007


On Sun, 13 May 2007 22:03:26 -0500, Michael Urman <murman at gmail.com> wrote:

On 5/13/07, Guido van Rossum <guido at python.org> wrote:

The answer to all of this is the filesystem encoding, which is already supported. Doesn't appear particularly difficult to me. Okay, that's fair. It seems reasonable to accept the limitations of following the filesystem encoding for module names. I should probably test py3k to make sure it already has updated import to use the filesystem encoding instead of the default encoding, but instead I'll just feebly imply the question here.

It's harder for this, actually. Even if you know the encoding, you'll still run into problems when you don't know the normalization. Consider the case where a developer creates a module with a non-ASCII name on OS X and then distributes it. There is a fair to strong chance that their source code will use NFC for the module name. During development, this will work just fine, as OS X normalizes all filename access to NFD. When someone on another platform attempts to use the module though, they will mysteriously find that it cannot be found. Their NFC spelling of the module name won't find the NFD file in the filesystem, and they will likely be completely baffled by the failure.

This is, of course, an existing difficulty with dealing with unicode filenames in Python, but at least the interpreter itself doesn't yet have to concern itself with it, as no language features require it. I suspect that if non-ASCII module names are allowed, a lot of people will be running into this.

Jean-Paul



More information about the Python-3000 mailing list