[Python-Dev] PEP 538: Coercing the legacy C locale to a UTF-8 based locale (original) (raw)

Nick Coghlan ncoghlan at gmail.com
Mon Mar 13 07:01:33 EDT 2017

Previous message (by thread): [Python-Dev] PEP 538: Coercing the legacy C locale to a UTF-8 based locale
Next message (by thread): [Python-Dev] PEP 538: Coercing the legacy C locale to a UTF-8 based locale
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 13 March 2017 at 18:37, INADA Naoki <songofacandy at gmail.com> wrote:

But locale coercing works nice on platforms like android. So how about simplified version of PEP 538? Just adding configure option for locale coercing which is disabled by default. No envvar options and no warnings.

That doesn't solve my original Linux distro problem, where locale misconfiguration problems show up as "Python 2 works, Python 3 doesn't work" behaviour and bug reports.

The problem is that where Python 2 was largely locale-independent by default (just passing raw bytes through) such that you'd only get immediate encoding or decoding errors if you had a Unicode literal or a decode() call somewhere in your code and would otherwise pass data corruption problems further down the chain, Python 3 is locale-aware by default, and eagerly decodes:

command line parameters
environment variables
responses from operating system API calls
standard stream input
file contents

You can still write locale-independent Python 3 applications, but they involve sprinkling liberal doses of "b" prefixes and suffixes and mode settings and "surrogateescape" error handler declarations in various places

you can't just run python-modernize over a pre-existing Python 2 application and expect it to behave the same way in the C locale as it did before.

Once implemented, PEP 540 will partially solve the problem by introducing a locale independent UTF-8 mode, but that still leaves the inconsistency with other locale-aware components that are needing to deal with Python 3 API calls that accept or return Unicode objects where Python 2 allowed the use of 8-bit strings.

Folks that really want the old behaviour back will be able to set PYTHONCOERCECLOCALE=0 (as that no longer emits any warnings), or else build their own CPython from source using --without-c-locale-coercion and ``--without-c-locale-warning`. However, they'll also get the explicit support notification from PEP 11 that any Unicode handling bugs they run into in those configurations are entirely their own problem - we won't fix them, because we consider those configurations unsupportable in the general case.

That puts the additional self-support burden on folks doing something unusual (i.e. insisting on running an ASCII-only environment in 2017), rather than on those with a more conventional use case (i.e. running an up to date *nix OS using UTF-8 or another universal encoding for both local and remote interfaces).

Cheers, Nick.

-- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/python-dev/attachments/20170313/4e7f7064/attachment.html>

Previous message (by thread): [Python-Dev] PEP 538: Coercing the legacy C locale to a UTF-8 based locale
Next message (by thread): [Python-Dev] PEP 538: Coercing the legacy C locale to a UTF-8 based locale
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the Python-Dev mailing list