(original) (raw)
On Windows, the Wide APIs are already used throughout the code base,e.g. SetEnvironmentVariableW/\_wenviron. If you need to find out the
specific API for a specific functionality, please read the source code.� \[...\]
No, I don't assume that. I assume that all functions are strictly
available in a Wide character version, and have verified that they are.
�
The wide APIs use UTF-16.� UTF-16 suffers from the same problem as UTF-8: not all sequences of words are valid UTF-16 sequences.� In particular, sequences containing isolated surrogate pairs are not well-formed according to the Unicode standard.� Therefore, the existence of a wide character API function does not guarantee that the wide character strings it returns can be converted into valid unicode strings.� And, in fact, Windows Vista happily creates files with malformed UTF-16 encodings, and os.listdir() happily returns them.
The wide APIs use UTF-16.� UTF-16 suffers from the same problem as UTF-8: not all sequences of words are valid UTF-16 sequences.� In particular, sequences containing isolated surrogate pairs are not well-formed according to the Unicode standard.� Therefore, the existence of a wide character API function does not guarantee that the wide character strings it returns can be converted into valid unicode strings.� And, in fact, Windows Vista happily creates files with malformed UTF-16 encodings, and os.listdir() happily returns them.
�
If you can crash Python that way,
nothing gets worse by this PEP - you can then \*already\* crash Python
in that way.
Yes, but AFAIK, Python does not currently have functions that, as part of correct usage and normal operation, are intended to generate malformed unicode strings.�
Under your proposal, passing the output from a correctly implemented file system or other OS function to a correctly written library using unicode strings may crash Python.� In order to avoid that, every library that's built into Python would have to be checked and updated to deal with both the Unicode standard and your extension to it.
Tom