[Python-Dev] PEP 538: Coercing the legacy C locale to a UTF-8 based locale (original) (raw)
Nick Coghlan ncoghlan at gmail.com
Mon Mar 13 07:01:33 EDT 2017
- Previous message (by thread): [Python-Dev] PEP 538: Coercing the legacy C locale to a UTF-8 based locale
- Next message (by thread): [Python-Dev] PEP 538: Coercing the legacy C locale to a UTF-8 based locale
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On 13 March 2017 at 18:37, INADA Naoki <songofacandy at gmail.com> wrote:
But locale coercing works nice on platforms like android. So how about simplified version of PEP 538? Just adding configure option for locale coercing which is disabled by default. No envvar options and no warnings.
That doesn't solve my original Linux distro problem, where locale misconfiguration problems show up as "Python 2 works, Python 3 doesn't work" behaviour and bug reports.
The problem is that where Python 2 was largely locale-independent by default (just passing raw bytes through) such that you'd only get immediate encoding or decoding errors if you had a Unicode literal or a decode() call somewhere in your code and would otherwise pass data corruption problems further down the chain, Python 3 is locale-aware by default, and eagerly decodes:
- command line parameters
- environment variables
- responses from operating system API calls
- standard stream input
- file contents
You can still write locale-independent Python 3 applications, but they involve sprinkling liberal doses of "b" prefixes and suffixes and mode settings and "surrogateescape" error handler declarations in various places
- you can't just run python-modernize over a pre-existing Python 2 application and expect it to behave the same way in the C locale as it did before.
Once implemented, PEP 540 will partially solve the problem by introducing a locale independent UTF-8 mode, but that still leaves the inconsistency with other locale-aware components that are needing to deal with Python 3 API calls that accept or return Unicode objects where Python 2 allowed the use of 8-bit strings.
Folks that really want the old behaviour back will be able to set
PYTHONCOERCECLOCALE=0 (as that no longer emits any warnings), or else build
their own CPython from source using --without-c-locale-coercion
and
``--without-c-locale-warning`. However, they'll also get the explicit
support notification from PEP 11 that any Unicode handling bugs they run
into in those configurations are entirely their own problem - we won't fix
them, because we consider those configurations unsupportable in the general
case.
That puts the additional self-support burden on folks doing something unusual (i.e. insisting on running an ASCII-only environment in 2017), rather than on those with a more conventional use case (i.e. running an up to date *nix OS using UTF-8 or another universal encoding for both local and remote interfaces).
Cheers, Nick.
-- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/python-dev/attachments/20170313/4e7f7064/attachment.html>
- Previous message (by thread): [Python-Dev] PEP 538: Coercing the legacy C locale to a UTF-8 based locale
- Next message (by thread): [Python-Dev] PEP 538: Coercing the legacy C locale to a UTF-8 based locale
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]