[Python-Dev] PEP 540: Add a new UTF-8 mode (v2) (original) (raw)

Chris Barker - NOAA Federal chris.barker at noaa.gov
Thu Dec 7 21:10:36 EST 2017


I’m a bit confused:

File names and the like are one thing, and the CONTENTS of files is quite another.

I get that there is theoretically a “default” encoding for the contents of text files, but that is SO likely to be wrong as to be ignorable.

open() already defaults to utf-8. Which is a fine default if you are going to have one, but it seems a bad idea to have it default to surrogateescape EVER, regardless of the locale or anything else.

If the file is binary, or a different encoding, or simply broken, it’s much better to get an encoding error as soon as possible.

Why does this have anything to do with the PEP?

Perhaps the issue of reading a filename from the system, writing it to a file, then reading it back in again.

I actually do that a lot — but mostly so I can pass that file to another system, so I really don’t want broken encoding in it anyway.

-CHB

Sent from my iPhone

On Dec 7, 2017, at 5:53 PM, Glenn Linderman <v+python at g.nevcal.com> wrote:

On 12/7/2017 5:45 PM, Jonathan Goble wrote:

On Thu, Dec 7, 2017 at 8:38 PM Glenn Linderman <v+python at g.nevcal.com> wrote:

If it were to be changed, one could add a text-mode option in 3.7, say "t" in the mode string, and a PendingDeprecationWarning for open calls without the specification of either t or b in the mode string.

"t" is already supported in open()'s mode argument [1] as a way to explicitly request text mode, though it's essentially ignored right now since text is the default anyway. So since the option is already present, the only thing needed at this stage for your plan would be to begin deprecating not using it.

goes back to lurking

[1] https://docs.python.org/3/library/functions.html#open

Thanks for briefly de-lurking.

So then for PEP 540... use surrogateescape immediately for t mode.

Then, when the user encounters an encoding error, there would be three solutions: switch to t mode, explicitly switch to surrogateescape, or fix the locale.


Python-Dev mailing list Python-Dev at python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/chris.barker%40noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/python-dev/attachments/20171207/190a49b0/attachment.html>



More information about the Python-Dev mailing list