[Python-Dev] PEP 538: Coercing the legacy C locale to a UTF-8 based locale (original) (raw)
Nick Coghlan ncoghlan at gmail.com
Sat May 6 04:33:14 EDT 2017
- Previous message (by thread): [Python-Dev] PEP 538: Coercing the legacy C locale to a UTF-8 based locale
- Next message (by thread): [Python-Dev] PEP 538: Coercing the legacy C locale to a UTF-8 based locale
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On 6 May 2017 at 18:00, Nick Coghlan <ncoghlan at gmail.com> wrote:
On 5 March 2017 at 17:50, Nick Coghlan <ncoghlan at gmail.com> wrote:
Hi folks,
Late last year I started working on a change to the CPython CLI (not the shared library) to get it to coerce the legacy C locale to something based on UTF-8 when a suitable locale is available. After a couple of rounds of iteration on linux-sig and python-ideas, I'm now bringing it to python-dev as a concrete proposal for Python 3.7. For most folks, reading the Abstract plus the draft docs updates in the reference implementation will tell you everything you need to know (if the C.UTF-8, C.utf8 or UTF-8 locales are available, the CLI will automatically attempt to coerce the legacy C locale to one of those rather than persisting with the latter's default assumption of ASCII as the preferred text encoding). I've just pushed a significant update to the PEP based on the discussions in this thread: https://github.com/python/peps/commit/2fb53e7c1bbb04e1321bca11cc0112aec69f6398 The main change at the technical level is to modify the handling of the coercion target locales such that they always lead to "surrogateescape" being used by default on the standard streams. That means we don't need to call "PySetStandardStreamEncoding" during startup, that subprocesses will behave the same way as their parent processes, and that Python in Linux containers will behave consistently regardless of whether the container locale is set to "C.UTF-8" explicitly, or is set to "C" and then coerced to "C.UTF-8" by CPython.
Working on the revised implementation for this, I've ended up refactoring it so that all the heavy lifting is done by a single function exported from the shared library: "_Py_CoerceLegacyLocale()".
The CLI code then just contains the check that says "Are we running in the legacy C locale? If so, call _Py_CoerceLegacyLocale()", with all the details of how the coercion actually works being hidden away inside pylifecycle.c.
That seems like a potential opportunity to make the 3.7 version of this a public API, using the following pattern:
if (Py_LegacyLocaleDetected()) {
Py_CoerceLegacyLocale();
}
That way applications embedding CPython that wanted to implement the same locale coercion logic would have an easy way to do so.
Thoughts?
Cheers, Nick.
-- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia
- Previous message (by thread): [Python-Dev] PEP 538: Coercing the legacy C locale to a UTF-8 based locale
- Next message (by thread): [Python-Dev] PEP 538: Coercing the legacy C locale to a UTF-8 based locale
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]