[Python-Dev] PEP 538: Coercing the legacy C locale to a UTF-8 based locale (original) (raw)

Toshio Kuratomi a.badger at gmail.com
Thu May 4 21:24:13 EDT 2017


On Sat, Mar 4, 2017 at 11:50 PM, Nick Coghlan <ncoghlan at gmail.com> wrote:

Providing implicit locale coercion only when running standalone --------------------------------------------------------------- Over the course of Python 3.x development, multiple attempts have been made to improve the handling of incorrect locale settings at the point where the Python interpreter is initialised. The problem that emerged is that this is ultimately too late in the interpreter startup process - data such as command line arguments and the contents of environment variables may have already been retrieved from the operating system and processed under the incorrect ASCII text encoding assumption well before PyInitialize is called. The problems created by those inconsistencies were then even harder to diagnose and debug than those created by believing the operating system's claim that ASCII was a suitable encoding to use for operating system interfaces. This was the case even for the default CPython binary, let alone larger C/C++ applications that embed CPython as a scripting engine. The approach proposed in this PEP handles that problem by moving the locale coercion as early as possible in the interpreter startup sequence when running standalone: it takes place directly in the C-level main() function, even before calling in to the `PyMain()library function that implements the_ _features of the CPython interpreter CLI._ _ThePyInitialize API then only gains an explicit warning (emitted on_ _stderr) when it detects use of the C`` locale, and relies on the embedding application to specify something more reasonable.

It feels like having a short section on the caveats of this approach would help to introduce this section. Something that says that this PEP can cause a split in how Python behaves in non-sandalone applications (mod_wsgi, IDEs where libpython is compiled in, etc) vs standalone (unless the embedders take similar steps as standalone python is doing). Then go on to state that this approach was still chosen as coercing in Py_Initialize is too late, causing the inconsistencies and problems listed here.

-Toshio



More information about the Python-Dev mailing list