[Python-Dev] open(): set the default encoding to 'utf-8' in Python 3.3? (original) (raw)
Victor Stinner victor.stinner at haypocalc.com
Wed Jun 29 11:50:57 CEST 2011
- Previous message: [Python-Dev] open(): set the default encoding to 'utf-8' in Python 3.3?
- Next message: [Python-Dev] open(): set the default encoding to 'utf-8' in Python 3.3?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Le mercredi 29 juin 2011 à 10:18 +0200, M.-A. Lemburg a écrit :
Victor Stinner wrote: > Le mardi 28 juin 2011 à 16:02 +0200, M.-A. Lemburg a écrit : >> How about a more radical change: have open() in Py3 default to >> opening the file in binary mode, if no encoding is given (even >> if the mode doesn't include 'b') ? > > I tried your suggested change: Python doesn't start.
No surprise there: it's an incompatible change, but one that undoes a wart introduced in the Py3 transition. Guessing encodings should be avoided whenever possible.
It means that all programs written for Python 3.0, 3.1, 3.2 will stop working with the new 3.x version (let say 3.3). Users will have to migrate from Python 2 to Python 3.2, and then migration from Python 3.2 to Python 3.3 :-(
I would prefer a ResourceWarning (emited if the encoding is not specified), hidden by default: it doesn't break compatibility, and -Werror gives exactly the same behaviour that you expect.
This demonstrates that Python's stdlib is still not being explicit about the encoding issues. I suppose that things just happen to work because we mostly use ASCII files for configuration and setup.
I did more tests. I found some mistakes and sometimes the binary mode can be used, but most function really expect the locale encoding (it is the correct encoding to read-write files). I agree that it would be to have an explicit encoding="locale", but make it mandatory is a little bit rude.
> Then I tried my suggestion (use "utf-8" by default): Python starts > correctly, I can build it (run "make") and... the full test suite pass > without any change. (I'm testing on Linux, my locale encoding is UTF-8.)
I bet it would also with "ascii" in most cases. Which then just means that the Python build process and test suite is not a good test case for choosing a default encoding. Linux is also a poor test candidate for this, since most user setups will use UTF-8 as locale encoding. Windows, OTOH, uses all sorts of code page encodings (usually not UTF-8), so you are likely to hit the real problem cases a lot easier.
I also ran the test suite on my patched Python (open uses UTF-8 by default) with ASCII locale encoding (LANG=C), the test suite does also pass. Many tests uses non-ASCII characters, some of them are skipped if the locale encoding is unable to encode the tested text.
Victor
- Previous message: [Python-Dev] open(): set the default encoding to 'utf-8' in Python 3.3?
- Next message: [Python-Dev] open(): set the default encoding to 'utf-8' in Python 3.3?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]