msg201800 - (view) |
Author: Caolán McNamara (Caolán.McNamara) |
Date: 2013-10-31 10:52 |
LANG=ka_GE.georgianps /usr/bin/python3 Fatal Python error: Py_Initialize: Unable to get the locale encoding LookupError: unknown encoding: GEORGIAN-PS Aborted (core dumped) but with python-2.7.5 no crash... LANG=ka_GE.georgianps /usr/bin/python2 Python 2.7.5 (default, Oct 8 2013, 12:19:40) [GCC 4.8.1 20130603 (Red Hat 4.8.1-1)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> (fedora 19) |
|
|
msg201801 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2013-10-31 10:56 |
This bug was initially reported in LibreOffice: https://bugs.freedesktop.org/show_bug.cgi?id=68850 |
|
|
msg201802 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2013-10-31 11:24 |
I found three georgian encodings: https://sourceware.org/git/?p=glibc.git;a=blob;f=localedata/charmaps/GEORGIAN-PS;h=64615ff4344d74ea0c70cfd7a6c6c8019afb884e;hb=HEAD https://sourceware.org/git/?p=glibc.git;a=blob;f=localedata/charmaps/GEORGIAN-ACADEMY;h=9dc1bc9e782e9fe6092a00daf1a75274fd6dd738;hb=HEAD http://tools.ietf.org/html/draft-giasher-geostd8-00 The first one ("GEORGIAN-PS") is probably the most accurate because it is the one included in the GNU libc. Could you please try to copy attached georgian_ps.py file into /usr/lib64/python3.3/encodings/ (or /usr/lib/python3.3/encodings/ for 32-bit Linux)? Then try to print georgian letters using: print(bytes(range(0xc0, 0xe6)).decode("GEORGIAN-PS")) Please give me also your locale encoding: import locale; print(locale.getpreferredencoding()) @Caolán: Do you know the GEORGIAN-ACADEMY encoding? It doesn't look to be used by any glibc locale. On my Fedora 18, I have 3 georgian locales: * ka_GE.georgianps: locale encoding GEORGIAN-PS * ka_GE: locale encoding GEORGIAN-PS * ka_GE.utf8: locale encoding UTF-8 You can workaround this issue by switching your locale from ka_GE.georgianps to ka_GE.utf8. |
|
|
msg404214 - (view) |
Author: Tal Einat (taleinat) *  |
Date: 2021-10-18 19:46 |
With recent versions of Python (e.g. 3.9) this no longer causes a crash. Python apparently falls back to UTF-8, at least on my system: $ LANG=ka_GE.georgianps python3.9 Python 3.9.7 (default, Sep 9 2021, 23:20:13) [GCC 9.3.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import locale; print(locale.getpreferredencoding()) UTF-8 I'm marking this as fixed. If someone still has issues with this encoding, please open a new issue with up-to-date information. |
|
|
msg404250 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2021-10-18 23:46 |
Python uses UTF-8 if the locale is not supported: $ LANG=xxx python3.9 -c "import sys; print(sys.flags.utf8_mode)" 1 On Fedora 34, the locale is still supported, and Python 3.11 still fails: vstinner@apu$ LANG=ka_GE.georgianps locale LANG=ka_GE.georgianps LC_CTYPE="ka_GE.georgianps" LC_NUMERIC="ka_GE.georgianps" LC_TIME="ka_GE.georgianps" LC_COLLATE="ka_GE.georgianps" LC_MONETARY="ka_GE.georgianps" LC_MESSAGES="ka_GE.georgianps" LC_PAPER="ka_GE.georgianps" LC_NAME="ka_GE.georgianps" LC_ADDRESS="ka_GE.georgianps" LC_TELEPHONE="ka_GE.georgianps" LC_MEASUREMENT="ka_GE.georgianps" LC_IDENTIFICATION="ka_GE.georgianps" LC_ALL= vstinner@apu$ LANG=ka_GE.georgianps python3.11 -c "import sys; print(sys.flags.utf8_mode)" Python path configuration: PYTHONHOME = (not set) PYTHONPATH = (not set) program name = './python' isolated = 0 environment = 1 user site = 1 import site = 1 stdlib dir = '/home/vstinner/python/main/Lib' sys._base_executable = '/home/vstinner/python/main/python' sys.base_prefix = '/usr/local' sys.base_exec_prefix = '/usr/local' sys.platlibdir = 'lib' sys.executable = '/home/vstinner/python/main/python' sys.prefix = '/usr/local' sys.exec_prefix = '/usr/local' sys.path = [ '/usr/local/lib/python311.zip', '/home/vstinner/python/main/Lib', '/home/vstinner/python/main/build/lib.linux-x86_64-3.11-pydebug', ] Fatal Python error: init_fs_encoding: failed to get the Python codec of the filesystem encoding Python runtime state: core initialized LookupError: unknown encoding: GEORGIAN-PS Current thread 0x00007ff89b81d2c0 (most recent call first): |
|
|
msg404275 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2021-10-19 08:44 |
Possible solutions (they can be combined): 1. Add support for the GEORGIAN-PS charset and all other encodings used in libc (). The problem is that it is difficult to get the official information about these encodings. 2. Falls back to utf-8 or ascii+surrogateescape in case of unsupported locale encoding. But typos can slip unnoticed. |
|
|
msg404290 - (view) |
Author: Marc-Andre Lemburg (lemburg) *  |
Date: 2021-10-19 11:20 |
On 19.10.2021 10:44, Serhiy Storchaka wrote: > > Possible solutions (they can be combined): > > 1. Add support for the GEORGIAN-PS charset and all other encodings used in libc (). The problem is that it is difficult to get the official information about these encodings. As with all encodings we add: there has to be a real need to support them natively in Python (as opposed to installing codecs via PyPI) and we need a definite source for the encoding, e.g. a standards document from an official body. IMO, we should not really add more encodings to the stdlib, but instead point people to e.g. the iconv package: https://pypi.org/project/python-iconv/ Perhaps we ought to make it easier for such packages to provide additional codecs even during the startup phase, e.g. via a special env var which points Python to a list of codec packages to load prior to initializing the I/O encoding... not sure whether this is possible, though. > 2. Falls back to utf-8 or ascii+surrogateescape in case of unsupported locale encoding. But typos can slip unnoticed. I think this would be a more general solution to such cases, provided the startup logic issues a visible warning about the fallback. |
|
|