[Python-Dev] PEP 383 (again) (original) (raw)

Thomas Breuel tmbdev at gmail.com
Wed Apr 29 00:30:42 CEST 2009


On Windows, the Wide APIs are already used throughout the code base, e.g. SetEnvironmentVariableW/wenviron. If you need to find out the specific API for a specific functionality, please read the source code. [...] No, I don't assume that. I assume that all functions are strictly available in a Wide character version, and have verified that they are.

The wide APIs use UTF-16. UTF-16 suffers from the same problem as UTF-8: not all sequences of words are valid UTF-16 sequences. In particular, sequences containing isolated surrogate pairs are not well-formed according to the Unicode standard. Therefore, the existence of a wide character API function does not guarantee that the wide character strings it returns can be converted into valid unicode strings. And, in fact, Windows Vista happily creates files with malformed UTF-16 encodings, and os.listdir() happily returns them.

If you can crash Python that way, nothing gets worse by this PEP - you can then already crash Python in that way.

Yes, but AFAIK, Python does not currently have functions that, as part of correct usage and normal operation, are intended to generate malformed unicode strings.

Under your proposal, passing the output from a correctly implemented file system or other OS function to a correctly written library using unicode strings may crash Python. In order to avoid that, every library that's built into Python would have to be checked and updated to deal with both the Unicode standard and your extension to it.

Tom -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/python-dev/attachments/20090429/726af08b/attachment.htm>



More information about the Python-Dev mailing list