I introduced initfsencoding() in #8610 to ensure that Py_FileSystemEncoding is not more NULL. In the discussion, Marc Lemburg noticed that falling back the UTF-8 on nl_langinfo(CODESET) error is a bad idea: ASCII is better (I agree). We cannot fall back to ASCII yet because there are two other problems that have to be fixed before that: - Python3 doesn't support surrogates in module filenames: see #8611 - If Py_FileSystemEncoding is NULL, encoding functions fallback to utf-8 (PyUnicode_GetDefaultEncoding()). #8715 proposes a new PyUnicode_EncodeFSDefault() function to fix this problem Attached patch is a partial fix for this issue.
PyUnicode_AsEncodedString() contains a special path for the file system encoding. I don't think that it is still needed, but I don't know how to check that. => read
I tried the patch on my import_unicode branch and it doesn't work if the locale encoding is not ASCII (as the current code doesn't work if the locale encoding is not UTF-8, #8611). If Py_FileSystemUnicodeEncoding is NULL: PyUnicode_EncodeFSDefault() should use mbcstowcs() and PyUnicode_DecodeFSDefault() should use wcstombcs(). They may reuse _Py_wchar2char() and _Py_char2wchar(). "ascii" should be used in initfsencoding().
initfsencoding() now raises a fatal error on get_codeset() error. Use a encoding different than the locale encoding on get_codeset() only leads to mojibake and encoding issues, it's not a good idea. Close this issue as invalid.
History
Date
User
Action
Args
2022-04-11 14:57:01
admin
set
github: 52971
2010-10-19 23:55:25
vstinner
set
status: open -> closedresolution: not a bugmessages: +