[Python-Dev] File system path encoding on Windows (original) (raw)

Victor Stinner victor.stinner at gmail.com
Mon Aug 29 19:14:55 EDT 2016


2016-08-20 21:31 GMT+02:00 Nick Coghlan <ncoghlan at gmail.com>:

Reading your summary meant this finally clicked with something Victor has been considering for a while: a "Force UTF-8" switch that told Python to ignore the locale encoding on Linux, and instead assume UTF-8 everywhere (command line parameter parsing, environment variable processing, filesystem encoding, standard streams, etc)

It's essentially the same problem you have on Windows, just with slightly different symptoms and consequences.

Yes and no, but more no than yes :-)

On Linux, the issue is quite simple: most major Linux distributions switched to UTF-8 by default, network shares use UTF-8, filenames are stored as UTF-8, applications expect UTF-8, etc. I proposed once a "-X utf8" switch, but more as a convenient workaround for badly configured system which encode data to UTF-8, but the locale encoding is not properly configured in some cases. The switch does a single thing: ignore the locale encoding, and force UTF-8 as the locale encoding.

Steve's proposition is specific to Windows, and Windows is a different world. On Windows, there is one unique distribution: the Microsoft flavor, and UTF-8 was and is never used as the ANSI code page (which is more and less the same thing that UNIX locale encoding). Using UTF-8 is something new, not really common in the Windows world. Steve said that UTF-8 is common in the .NET (but I don't know well Windows community/universe).

I proposed to Steve to work on an unified "-X utf8" option to explicitly force UTF-8 on Linux and Windows. But Steve looks to prefer to force UTF-8 by default, but add a new option to revert the old behaviour.

I proposed the idea, but I'm not sure that we can have a single option for Linux and Windows. Moreover, I never really worked on trying to implement "-X utf8" on Linux, because it looks like the "misconfigured system" are less and less common nowadays. I see very few user requests in this direction.

By the way, except Steve, did someone complain about the ANSI code page for bytes on Windows in Python? I recall one or two issues last 5 years about the os.listdir(bytes) issue, but these issues were specific to Python 2 if I recall correctly?

Victor



More information about the Python-Dev mailing list