bpo-34589: C locale coercion off by default by vstinner · Pull Request #9073 · python/cpython (original) (raw)
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Conversation9 Commits3 Checks0 Files changed
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters
[ Show hidden characters]({{ revealButtonHref }})
The C locale coercion is always disabled by default. It can now only
be enabled in main() and wmain() functions: by the Python program
("python3").
Rename coerce_c_locale and coerce_c_locale_warn attributes of
_PyCoreConfig to _coerce_c_locale and _coerce_c_locale_warn: fields
are now private.
https://bugs.python.org/issue34589
This is still doing the coercion far too late, after command line options (et al) have already been decoded.
Please just take the fields out of the core config struct entirely, and change it back to doing the coercion before PyDecode_Locale even gets called to read the command line arguments the first time.
This is still doing the coercion far too late, after command line options (et al) have already been decoded.
pymain_read_conf() detects if the encoding changed after the configuration has been read. In that case, it forgets everything about the configuration and reads again the configuration. This is how I implemented properly the UTF-8 Mode which is configured by 3 different options:
- -X utf8 command line option
- LC_CTYPE locale
- PYTHONUTF8 env var
Please just take the fields out of the core config struct entirely, and change it back to doing the coercion before PyDecode_Locale even gets called to read the command line arguments the first time.
Py_DecodeLocale() is also affected by Py_UTF8Mode. The issue is not specific to the C locale coercion.
I understand that you are saying that the current implementation doesn't work. Maybe I made a mistake, that's totally possible, but would you mind to explain how to see this bug? Because I made many manual tests and I'm not aware of any issue with the C locale coercion.
--
This change only makes sure that the C locale coercion cannot be not enabled when Python is embedded. Is that what you wanted to do?
The intent of the locale coercion design is to make it work for any embedding application, not just one that has been carefully crafted to decode everything incorrectly, realise it made a mistake, and then decode it back again.
The way you're currently doing it instead tightly couples the locale coercion to CPython specifically, such that it won't work for arbitrary applications that just happen to be embedding CPython. That's not the way locale coercion is supposed to work - it's supposed to be a behaviour of the Python CLI application that any other application embedding Python can emulate, not a property of the CPython runtime itself the way UTF-8 mode is.
Anyway, I've realised that the easiest way to make myself clear is to just submit my own PR that puts the PEP 538 implementation back the way it was intended to be, while still retaining your other improvements (like deferring emitting the warning until after the C level stdio streams have been initialised properly).
Note that "broken" above isn't the right word, since you've carefully crafted it to not be broken. "Fragile" would be a better word - just like UTF-8 mode, implementing locale coercion the way you currently have requires paying carefully attention to every single thing that CPython might read from the operating system, so it can be re-encoded and decoded differently if the locale encoding changes later on.
3 tests failed: test_c_locale_coercion test_cmd_line test_sys
Py_Initialize() and Py_Main() cannot enable the C locale coercion (PEP 538) anymore: it is always disabled. It can now only be enabled by the Python program ("python3).
test_embed: get_filesystem_encoding() doesn't have to set PYTHONUTF8 nor PYTHONCOERCECLOCALE, these variables are already set in the parent.
Rename coerce_c_locale and coerce_c_locale_warn attributes of
_PyCoreConfig to _coerce_c_locale and _coerce_c_locale_warn: fields
are now private.
I merged PR #9371, so this PR is now simplified to its limited scope: Py_Initialize() and Py_Main() don't enable C locale coercion anymore.
_coerce_c_locale cannot be tested using Py_Initialize()/Py_Main() anymore.
This was referenced
Sep 17, 2018
vstinner added a commit that referenced this pull request
… by default (GH-9379)
_PyCoreConfig:
- Rename coerce_c_locale to _coerce_c_locale
- Rename coerce_c_locale_warn to _coerce_c_locale_warn
These fields are now private (name prefixed by "_").
(cherry picked from commit 188ebfa)
Py_Initialize() and Py_Main() cannot enable the C locale coercion (PEP 538) anymore: it is always disabled. It can now only be enabled by the Python program ("python3).
test_embed: get_filesystem_encoding() doesn't have to set PYTHONUTF8 nor PYTHONCOERCECLOCALE, these variables are already set in the parent.
(cherry picked from commit 7a0791b)
Add a new -X coerce_c_locale command line option to control C locale coercion (PEP 538).
(cherry picked from commit dbdee00)
Victor, why did you merge this? We were a long way from having agreement that this is the right thing to do, and now I'm going to have to update #9257 to account for your cavalier changes.
vstinner added a commit that referenced this pull request