[Python-Dev] PEP 540: Add a new UTF-8 mode (v2) (original) (raw)
Nick Coghlan ncoghlan at gmail.com
Tue Dec 5 21:31:12 EST 2017
- Previous message (by thread): [Python-Dev] PEP 540: Add a new UTF-8 mode (v2)
- Next message (by thread): [Python-Dev] PEP 540: Add a new UTF-8 mode (v2)
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On 6 December 2017 at 11:01, Victor Stinner <victor.stinner at gmail.com> wrote:
Annex: Differences between the PEP 538 and the PEP 540 ======================================================
The PEP 538 uses the "C.UTF-8" locale which is quite new and only supported by a few Linux distributions; this locale is not currently supported by FreeBSD or macOS for example. This PEP 540 supports all operating systems. The PEP 538 only changes the behaviour for the POSIX locale. While the new UTF-8 mode of this PEP is only enabled by the POSIX locale, it can be enabled manually for any other locale. The PEP 538 is implemented with
setlocale(LCCTYPE, "C.UTF-8")
: any non-Python code running in the process is impacted by this change. This PEP is implemented in Python internals and ignores the locale: non-Python running in the same process is not aware of the "Python UTF-8 mode".
I submitted a PR to reword this part: https://github.com/python/peps/pull/493
The main advantage of the PEP 538 ùover* the PEP 540 is that, for the POSIX locale, non-Python code running in the same process gets the UTF-8 encoding.
To be honest, I'm not sure that there is a lot of code in the wild which uses "text" types like the C type wchart* and rely on the locale encoding. Almost all C library handle data as bytes using the char* type, like filenames and environment variables.
At the very least, GNU readline breaks if you don't change the locale setting: https://www.python.org/dev/peps/pep-0538/#considering-locale-coercion-independently-of-utf-8-mode
Given that we found an example of this directly in the standard library, I assume that there are plenty more in third party extension modules (especially once we take C++ extensions into account, not just C ones).
First I understood that the PEP 538 changed the locale encoding using an environment variable. But no, it's implemented with setlocale(LCCTYPE, "C.UTF-8") which only impacts the current process and is not inherited by child processes. So I'm not sure anymore that PEP 538 and PEP 540 are really complementary.
It sets the LC_CTYPE environment variable as well: https://www.python.org/dev/peps/pep-0538/#explicitly-setting-lc-ctype-for-utf-8-locale-coercion
The relevant code is in _coerce_default_locale_settings (currently at https://github.com/python/cpython/blob/master/Python/pylifecycle.c#L448)
I'm not sure how PyGTK interacts with the PEP 538 for example. Does it use UTF-8 with the POSIX locale?
Desktop environments aim not to get into this situation in the first place by ensuring they're using a more appropriate locale :)
Cheers, Nick.
-- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia
- Previous message (by thread): [Python-Dev] PEP 540: Add a new UTF-8 mode (v2)
- Next message (by thread): [Python-Dev] PEP 540: Add a new UTF-8 mode (v2)
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]