[Python-Dev] PEP 383 (again) (original) (raw)
Thomas Breuel tmbdev at gmail.com
Tue Apr 28 08:29:23 CEST 2009
- Previous message: [Python-Dev] 2.6.2 Vista installer failure on upgrade from 2.6.1
- Next message: [Python-Dev] PEP 383 (again)
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
I thought PEP-383 was a fairly neat approach, but after thinking about it, I now think that it is wrong.
PEP-383 attempts to represent non-UTF-8 byte sequences in Unicode strings in a reversible way. But how do those non-UTF-8 byte sequences get into those path names in the first place? Most likely because an encoding other than UTF-8 was used to write the file system, but you're now trying to interpret its path names as UTF-8.
Quietly escaping a bad UTF-8 encoding with private Unicode characters is unlikely to be the right thing, since using the wrong encoding likely means that other characters are decoded incorrectly as well. As a result, the path name may fail in string comparisons and pattern matching, and will look wrong to the user in print statements and dialog boxes. Therefore, when Python encounters path names on a file system that are not consistent with the (assumed) encoding for that file system, Python should raise an error.
If you really don't care what the string looks like and you just want an encoding that round-trips without loss, you can probably just set your encoding to one of the 8 bit encodings, like ISO 8859-15. Decoding arbitrary byte sequences to unicode strings as ISO 8859-15 is no less correct than decoding them as the proposed "utf-8b". In fact, the most likely source of non-UTF-8 sequences is ISO 8859 encodings.
As for what the byte-oriented interfaces should do, they are simply platform dependent. On UNIX, they should do the obvious thing. On Windows, they can either hook up to the low-level byte-oriented system calls that the systems supply, or Windows could fake it and have the byte-oriented interfaces use UTF-8 encodings always and reject non-UTF-8 sequences as illegal (there are already many illegal byte sequences anyway).
Tom -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/python-dev/attachments/20090428/363482a0/attachment.htm>
- Previous message: [Python-Dev] 2.6.2 Vista installer failure on upgrade from 2.6.1
- Next message: [Python-Dev] PEP 383 (again)
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]