[Python-Dev] PEP 383 (again) (original) (raw)

Thomas Breuel tmbdev at gmail.com
Tue Apr 28 09:30:01 CEST 2009

Previous message: [Python-Dev] PEP 383 (again)
Next message: [Python-Dev] PEP 383 (again)
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

> Therefore, when Python encounters path names on a file system > that are not consistent with the (assumed) encoding for that file > system, Python should raise an error.

This is what happens currently, and users are quite unhappy about it.

We need to keep "users" and "programmers" distinct here.

Programmers may find it inconvenient that they have to spend time figuring out and deal with platform-dependent file system encoding issues and errors. But internationalization and unicode are hard, that's just a fact of life.

End users, however, are going to be quite unhappy if they get a string of gibberish for a file name because you decided to interpret some non-Unicode string as UTF-8-with-extra-bytes.

Or some Python program might copy files from an ISO8859-15 encoded file system to a UTF-8 encoded file system, and instead of getting an error when the encodings are set incorrectly, Python would quietly create ISO8859-15 encoded file names, making the target file system inconsistent.

There is a lot of potential for major problems for end users with your proposals. In both cases, what should happen is that the end user gets an error, submits a bug, and the programmer figures out how to deal with the encoding issues correctly.

Yes, users can do that (to a degree), but they are still unhappy about it. The approach actually fails for command line arguments

As it should: if I give an ISO8859-15 encoded command line argument to a Python program that expects a UTF-8 encoding, the Python program should tell me that there is something wrong when it notices that. Quietly continuing is the wrong thing to do.

If we follow your approach, that ISO8859-15 string will get turned into an escaped unicode string inside Python. If I understand your proposal correctly, if it's a output file name and gets passed to Python's open function, Python will then decode that string and end up with an ISO8859-15 byte sequence, which it will write to disk literally, even if the encoding for the system is UTF-8. That's the wrong thing to do.

As is, these interfaces are incomplete - they don't support command

line arguments, or environment variables. If you want to complete them, you should write a PEP.

There's no point in scratching when there's no itch.

Tom

PS:

Quietly escaping a bad UTF-8 encoding with private Unicode characters is > unlikely to be the right thing

And indeed, the PEP stopped using PUA characters.

Let me rephrase this: "quietly escaping a bad UTF-8 encoding is unlikely to be the right thing"; it doesn't matter how you do it. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/python-dev/attachments/20090428/cf12222f/attachment.htm>

Previous message: [Python-Dev] PEP 383 (again)
Next message: [Python-Dev] PEP 383 (again)
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the Python-Dev mailing list