[Python-Dev] PEP 538 (review round 2): Coercing the legacy C locale to a UTF-8 based locale (original) (raw)
Nick Coghlan ncoghlan at gmail.com
Tue May 9 07:57:04 EDT 2017
- Previous message (by thread): [Python-Dev] Outdated GitHub clone of the old svn repository
- Next message (by thread): [Python-Dev] PEP 538 (review round 2): Coercing the legacy C locale to a UTF-8 based locale
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Hi folks,
Enough changes have accumulated in PEP 538 since the start of the previous thread that it seems sensible to me to start a new thread specifically covering the current design (which aims to address all the concerns raised in the previous thread).
I haven't requoted the PEP in full since it's so long, but will instead refer readers to the web version: https://www.python.org/dev/peps/pep-0538/
I also generated a diff covered the full changes to the PEP text:
- https://gist.github.com/ncoghlan/1067805fe673b3735ac854e195747493/revisions (this is the diff covering the last few days of changes
Summarising the key technical changes:
- to make the runtime behaviour independent of whether or not locale coercion took place, stdin and stderr now always have "surrogateescape" as their error handler in the potential coercion target locales. This means Python will behave the same way regardless of whether the locale gets set externally (e.g. by a parent Python process or a container image definition) or implicitly during CLI startup
- for the full locales, the interpreter now sets LC_CTYPE and LANG, not LC_ALL. This means LC_ALL is once again a full locale override, and also means that CPython won't inadvertently interfere with other locale categories like LC_MONETARY, LC_NUMERIC, etc
- the reference implementation has been refactored so the bulk of the new code lives in the shared library and is exposed to the linker via a couple of underscore prefixed API symbols (_Py_LegacyLocaleDetected() and _Py_CoerceLegacyLocale()). While the current PEP still keeps them private, it would be straightforward to make them public for use in embedding applications if we decided we wanted to do so.
- locale coercion and warnings are now enabled by default on all platforms that use the autotools-based build chain - the assumption that some platforms didn't need them turned out to be incorrect
In addition to being updated to cover the above changes, the Rationale section of the PEP has also been updated to explain why it doesn't propose setting PYTHONIOENCODING, and to walk through some examples of the problems with GNU readlines compatibility when the current locale isn't set correctly.
The essential related changes to the reference implementation can be seen here:
- Always set "surrogateescape" for coercion target locales, independently of whether or not coercion occurred: https://github.com/ncoghlan/cpython/commit/188e7807b6d9e49377aacbb287c074e5cabf70c5
- Stop setting LC_ALL: https://github.com/python/peps/commit/2f530ce0d1fd24835ac0c6f984f40db70482a18f
(There are also some smaller cleanup commits that can be seen by browsing that branch on GitHub)
Cheers, Nick.
-- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia
- Previous message (by thread): [Python-Dev] Outdated GitHub clone of the old svn repository
- Next message (by thread): [Python-Dev] PEP 538 (review round 2): Coercing the legacy C locale to a UTF-8 based locale
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]