[Python-Dev] PEP 383 (again) (original) (raw)

"Martin v. Löwis" martin at v.loewis.de
Tue Apr 28 22:04:12 CEST 2009


Your proposal says that utf-8b would be used for file systems, but then you also say that it might be used for command line arguments and environment variables. So, which specific APIs will it be used with on Windows and on POSIX systems?

On Windows, the Wide APIs are already used throughout the code base, e.g. SetEnvironmentVariableW/_wenviron. If you need to find out the specific API for a specific functionality, please read the source code.

Or will utf-8b simply not be available on Windows at all?

It will be available, but it won't be used automatically for anything.

What happens if I create a Python version of tar, utf-8b strings slip in there, and I try to use them on Windows?

No need to create it - the tarfile module is already there. By "in there", do you mean on the file system, or in the tarfile?

You also assume that all Windows file system functions strictly conform to UTF-16 in practice (not just on paper). Have you verified that?

No, I don't assume that. I assume that all functions are strictly available in a Wide character version, and have verified that they are.

What's the situation on Windows CE?

I can't see how this question is relevant to the PEP. The PEP says this:

On Windows, Python uses the wide character APIs to access

character-oriented APIs, allowing direct conversion of the

environmental data to Python str objects.

This is what it already does, and this is what it will continue to do.

Another question on Linux: what happens when I decode a file system path with utf-8b and then pass the resulting unicode string to Gnome? To Qt?

You probably get moji-bake, or an error, I didn't try.

To windows.forms? To Java?

How do you do that, on Linux?

To a unicode regular expression library?

You mean, SRE? SRE will match the code points as individual characters, class Cs. You should have been able to find out that for yourself.

To wprintf?

Depends on the wprintf implementation.

AFAIK, the behavior of most libraries is undefined for the kinds of unicode strings you construct, and it may be undefined in a bad way (crash, buffer overflow, whatever).

Indeed so. This is intentional. If you can crash Python that way, nothing gets worse by this PEP - you can then already crash Python in that way.

Regards, Martin



More information about the Python-Dev mailing list