bpo-34589: C locale coercion off by default by vstinner · Pull Request #9073 · python/cpython (original) (raw)

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Conversation9 Commits3 Checks0 Files changed

Conversation

This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters

[ Show hidden characters]({{ revealButtonHref }})

vstinner

The C locale coercion is always disabled by default. It can now only
be enabled in main() and wmain() functions: by the Python program
("python3").

Rename coerce_c_locale and coerce_c_locale_warn attributes of
_PyCoreConfig to _coerce_c_locale and _coerce_c_locale_warn: fields
are now private.

https://bugs.python.org/issue34589

@ncoghlan

This is still doing the coercion far too late, after command line options (et al) have already been decoded.

Please just take the fields out of the core config struct entirely, and change it back to doing the coercion before PyDecode_Locale even gets called to read the command line arguments the first time.

@ncoghlan

@vstinner

This is still doing the coercion far too late, after command line options (et al) have already been decoded.

pymain_read_conf() detects if the encoding changed after the configuration has been read. In that case, it forgets everything about the configuration and reads again the configuration. This is how I implemented properly the UTF-8 Mode which is configured by 3 different options:

Please just take the fields out of the core config struct entirely, and change it back to doing the coercion before PyDecode_Locale even gets called to read the command line arguments the first time.

Py_DecodeLocale() is also affected by Py_UTF8Mode. The issue is not specific to the C locale coercion.

I understand that you are saying that the current implementation doesn't work. Maybe I made a mistake, that's totally possible, but would you mind to explain how to see this bug? Because I made many manual tests and I'm not aware of any issue with the C locale coercion.

--

This change only makes sure that the C locale coercion cannot be not enabled when Python is embedded. Is that what you wanted to do?

@ncoghlan

The intent of the locale coercion design is to make it work for any embedding application, not just one that has been carefully crafted to decode everything incorrectly, realise it made a mistake, and then decode it back again.

The way you're currently doing it instead tightly couples the locale coercion to CPython specifically, such that it won't work for arbitrary applications that just happen to be embedding CPython. That's not the way locale coercion is supposed to work - it's supposed to be a behaviour of the Python CLI application that any other application embedding Python can emulate, not a property of the CPython runtime itself the way UTF-8 mode is.

Anyway, I've realised that the easiest way to make myself clear is to just submit my own PR that puts the PEP 538 implementation back the way it was intended to be, while still retaining your other improvements (like deferring emitting the warning until after the C level stdio streams have been initialised properly).

@ncoghlan

Note that "broken" above isn't the right word, since you've carefully crafted it to not be broken. "Fragile" would be a better word - just like UTF-8 mode, implementing locale coercion the way you currently have requires paying carefully attention to every single thing that CPython might read from the operating system, so it can be re-encoded and decoded differently if the locale encoding changes later on.

@vstinner

@vstinner

3 tests failed: test_c_locale_coercion test_cmd_line test_sys

@vstinner

Py_Initialize() and Py_Main() cannot enable the C locale coercion (PEP 538) anymore: it is always disabled. It can now only be enabled by the Python program ("python3).

test_embed: get_filesystem_encoding() doesn't have to set PYTHONUTF8 nor PYTHONCOERCECLOCALE, these variables are already set in the parent.

@vstinner

Rename coerce_c_locale and coerce_c_locale_warn attributes of
_PyCoreConfig to _coerce_c_locale and _coerce_c_locale_warn: fields
are now private.

I merged PR #9371, so this PR is now simplified to its limited scope: Py_Initialize() and Py_Main() don't enable C locale coercion anymore.

@vstinner

@vstinner

_coerce_c_locale cannot be tested using Py_Initialize()/Py_Main() anymore.

This was referenced

Sep 17, 2018

vstinner added a commit that referenced this pull request

Sep 18, 2018

@vstinner

… by default (GH-9379)

_PyCoreConfig:

These fields are now private (name prefixed by "_").

(cherry picked from commit 188ebfa)

Py_Initialize() and Py_Main() cannot enable the C locale coercion (PEP 538) anymore: it is always disabled. It can now only be enabled by the Python program ("python3).

test_embed: get_filesystem_encoding() doesn't have to set PYTHONUTF8 nor PYTHONCOERCECLOCALE, these variables are already set in the parent.

(cherry picked from commit 7a0791b)

Add a new -X coerce_c_locale command line option to control C locale coercion (PEP 538).

(cherry picked from commit dbdee00)

@ncoghlan

Victor, why did you merge this? We were a long way from having agreement that this is the right thing to do, and now I'm going to have to update #9257 to account for your cavalier changes.

vstinner added a commit that referenced this pull request

Sep 19, 2018

@vstinner