[Python-Dev] open(): set the default encoding to 'utf-8' in Python 3.3? (original) (raw)
Toshio Kuratomi a.badger at gmail.com
Tue Jun 28 18:33:51 CEST 2011
- Previous message: [Python-Dev] open(): set the default encoding to 'utf-8' in Python 3.3?
- Next message: [Python-Dev] open(): set the default encoding to 'utf-8' in Python 3.3?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Tue, Jun 28, 2011 at 03:46:12PM +0100, Paul Moore wrote:
On 28 June 2011 14:43, Victor Stinner <victor.stinner at haypocalc.com> wrote: > As discussed before on this list, I propose to set the default encoding > of open() to UTF-8 in Python 3.3, and add a warning in Python 3.2 if > open() is called without an explicit encoding and if the locale encoding > is not UTF-8. Using the warning, you will quickly notice the potential > problem (using Python 3.2.2 and -Werror) on Windows or by using a > different locale encoding (.e.g using LANG="C").
-1. This will make things harder for simple scripts which are not intended to be cross-platform. I use Windows, and come from the UK, so 99% of my text files are ASCII. So the majority of my code will be unaffected. But in the occasional situation where I use a £ sign, I'll get encoding errors, where currently things will "just work". And the failures will be data dependent, and hence intermittent (the worst type of problem). I'll write a quick script, use it once and it'll be fine, then use it later on some different data and get an error. :-( I don't think this change would make things "harder". It will just move where the pain occurs. Right now, the failures are intermittent on A) computers other than the one that you're using. or B) intermittent when run under a different user than yourself. Sys admins where I'm at are constantly writing ad hoc scripts in python that break because you stick something in a cron job and the locale settings suddenly become "C" and therefore the script suddenly only deals with ASCII characters.
I don't know that Victor's proposed solution is the best (I personally would like it a whole lot more than the current guessing but I never develop on Windows so I can certainly see that your environment can lead to the opposite assumption :-) but something should change here. Issuing a warning like "open used without explicit encoding may lead to errors" if open() is used without an explicit encoding would help a little (at least, people who get errors would then have an inkling that the culprit might be an open() call). If I read Victor's previous email correctly, though, he said this was previously rejected.
Another brainstorming solution would be to use different default encodings on different platforms. For instance, for writing files, utf-8 on *nix systems (including macosX) and utf-16 on windows. For reading files, check for a utf-16 BOM, if not present, operate as utf-8. That would seem to address your issue with detection by vim, etc but I'm not sure about getting "£" in your input stream. I don't know where your input is coming from and how Windows equivalent of locale plays into that.
-Toshio -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 198 bytes Desc: not available URL: <http://mail.python.org/pipermail/python-dev/attachments/20110628/11ac1081/attachment.pgp>
- Previous message: [Python-Dev] open(): set the default encoding to 'utf-8' in Python 3.3?
- Next message: [Python-Dev] open(): set the default encoding to 'utf-8' in Python 3.3?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]