[Python-Dev] PEP 540: Add a new UTF-8 mode (v3) (original) (raw)

Victor Stinner victor.stinner at gmail.com
Fri Dec 8 10:22:35 EST 2017

Previous message (by thread): [Python-Dev] PEP 540: Add a new UTF-8 mode (v3)
Next message (by thread): [Python-Dev] PEP 540: Add a new UTF-8 mode (v3)
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

I updated my PEP: in the 4th version, locale.getpreferredencoding() now returns 'UTF-8' in the UTF-8 Mode.

https://www.python.org/dev/peps/pep-0540/

I also clarified the direct effects of the UTF-8 Mode, but also listed the most user visible changes as "Side effects".

""" Effects of the UTF-8 Mode:

sys.getfilesystemencoding() returns 'UTF-8'.
locale.getpreferredencoding() returns UTF-8, its do_setlocale argument and the locale encoding are ignored.
sys.stdin and sys.stdout error handler is set to surrogateescape

Side effects:

open() uses the UTF-8 encoding by default.
os.fsdecode() and os.fsencode() use the UTF-8 encoding.
Command line arguments, environment variables and filenames use the UTF-8 encoding. """

Thank you Naokia INADA for your quick feedback, it was very helpful and I really like how the PEP evolves!

IMHO the PEP 540 version 4 is just perfect and ready for pronouncement! (... until someone finds another flaw, obviously!)

Victor

2017-12-08 13:58 GMT+01:00 Victor Stinner <victor.stinner at gmail.com>:

2017-12-08 6:11 GMT+01:00 INADA Naoki <songofacandy at gmail.com>:

Or should we change loale.getpreferredencoding() to return UTF-8 instead of ASCII always, regardless of PEP 538 and 540? On the POSIX locale, if the locale coercion works (PEP 538), locale.getpreferredencoding() returns UTF-8. We are good. The question is for platforms like Centos 7 where the locale coercion (PEP 538) doesn't work and so Python uses UTF-8 (PEP 540), whereas the locale probably uses ASCII (or maybe Latin1). My current implementation of the PEP 540 is cheating for open(): if sys.flags.utf8mode is non-zero, use the UTF-8 encoding rather than calling locale.getpreferredencoding(). I checked the stdlib, and I found many places where locale.getpreferredencoding() is used to get the user preferred encoding: * builtin open(): default encoding * cgi.FieldStorage: encode the query string * encoding.aliasmbcs(): check if the requested encoding is the ANSI code page * gettext.GNUTranslations: lgettext() and lngettext() methods * xml.etree.ElementTree: ElementTree.write(encoding='unicode') In the UTF-8 mode, I would expect that cgi, gettext and xml.etree all use the UTF-8 encoding by default. So locale.getpreferredencoding() should return UTF-8 if the UTF-8 mode is enabled. The private aliasmbcs() method can be modified to call directly locale.getdefaultlocale()[1] to get the ANSI code page. Question: do we need to add an option to getpreferredencoding() to return the locale encoding even if the UTF-8 mode is enabled. If yes, what should be the API? locale.getpreferredencoding(utf8mode=False)? Victor

Previous message (by thread): [Python-Dev] PEP 540: Add a new UTF-8 mode (v3)
Next message (by thread): [Python-Dev] PEP 540: Add a new UTF-8 mode (v3)
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the Python-Dev mailing list