[Python-Dev] PEP 383 (again) (original) (raw)
"Martin v. Löwis" martin at v.loewis.de
Tue Apr 28 08:59:19 CEST 2009
- Previous message: [Python-Dev] PEP 383 (again)
- Next message: [Python-Dev] PEP 383 (again)
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
PEP-383 attempts to represent non-UTF-8 byte sequences in Unicode strings in a reversible way.
That isn't really true; it is not, inherently, about UTF-8. Instead, it tries to represent non-filesystem-encoding byte sequence in Unicode strings in a reversible way.
Quietly escaping a bad UTF-8 encoding with private Unicode characters is unlikely to be the right thing
And indeed, the PEP stopped using PUA characters.
Therefore, when Python encounters path names on a file system that are not consistent with the (assumed) encoding for that file system, Python should raise an error.
This is what happens currently, and users are quite unhappy about it.
If you really don't care what the string looks like and you just want an encoding that round-trips without loss, you can probably just set your encoding to one of the 8 bit encodings, like ISO 8859-15. Decoding arbitrary byte sequences to unicode strings as ISO 8859-15 is no less correct than decoding them as the proposed "utf-8b". In fact, the most likely source of non-UTF-8 sequences is ISO 8859 encodings.
Yes, users can do that (to a degree), but they are still unhappy about it. The approach actually fails for command line arguments
As for what the byte-oriented interfaces should do, they are simply platform dependent. On UNIX, they should do the obvious thing. On Windows, they can either hook up to the low-level byte-oriented system calls that the systems supply, or Windows could fake it and have the byte-oriented interfaces use UTF-8 encodings always and reject non-UTF-8 sequences as illegal (there are already many illegal byte sequences anyway).
As is, these interfaces are incomplete - they don't support command line arguments, or environment variables. If you want to complete them, you should write a PEP.
Regards, Martin
- Previous message: [Python-Dev] PEP 383 (again)
- Next message: [Python-Dev] PEP 383 (again)
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]