[Python-Dev] PEP 540: Add a new UTF-8 mode (v3) (original) (raw)

Victor Stinner victor.stinner at gmail.com
Fri Dec 8 10🔞29 EST 2017


2017-12-08 15:01 GMT+01:00 INADA Naoki <songofacandy at gmail.com>:

In short, locale coercion and UTF-8 mode will be both enabled by the POSIX locale. Hm, it is bit surprising because I thought UTF-8 mode is fallback of locale coercion when coercion is failed or disabled.

I rewrote the "differences between the PEP 538 and the PEP 540" as a new section "Relationship with the locale coercion (PEP 538)".

https://www.python.org/dev/peps/pep-0540/#relationship-with-the-locale-coercion-pep-538

""" Relationship with the locale coercion (PEP 538)

The POSIX locale enables the locale coercion (PEP 538) and the UTF-8 mode (PEP 540). When the locale coercion is enabled, enabling the UTF-8 mode has no (additional) effect.

Locale coercion only impacts non-Python code like C libraries, whereas the Python UTF-8 Mode only impacts Python code: the two PEPs are complementary.

On platforms where locale coercion is not supported like Centos 7, the POSIX locale only enables the UTF-8 Mode. In this case, Python code uses the UTF-8 encoding and ignores the locale encoding, whereas non-Python code uses the locale encoding which is usually ASCII for the POSIX locale.

While the UTF-8 Mode is supported on all platforms and can be enabled with any locale, the locale coercion is not supported by all platforms and is restricted to the POSIX locale.

The UTF-8 Mode has only an impact on Python child processes when the PYTHONUTF8 environment variable is set to 1, whereas the locale coercion sets the LC_CTYPE environment variables which impacts all child processes.

The benefit of the locale coercion approach is that it helps ensure that encoding handling in binary extension modules and child processes is consistent with Python's encoding handling. The upside of the UTF-8 Mode approach is that it allows an embedding application to change the interpreter's behaviour without having to change the process global locale settings. """

I hope that it's now better explained.

In short, the two PEPs are really complementary.

As PEP 538 [1], all coercion target locales uses surrogateescape for stdin and stdout. So, do you mean "UTF-8 mode enabled as flag level, but it has no real effects"?

Right and it was a deliberate choice of Nick Coghlan when he designed the PEP 538, to make sure that the two PEPs are complementary and "compatible".

Victor



More information about the Python-Dev mailing list