msg230106 - (view) |
Author: Matt Frank (WanderingLogic) * |
Date: 2014-10-27 21:30 |
On systems where configure is unable to find langinfo.h (or where nl_langinfo() is not defined), configure undefines HAVE_LANGINFO_H in pyconfig.h. Then in pythonrun.c:get_locale_encoding() the call to nl_langinfo() is wrapped in an #ifdef, but the #else path on the ifdef does a PyErr_SetNone(PyExc_NotImplementedError) and returns NULL, which causes initfsencoding() to fail with the message "Py_Initialize: Unable to get the locale encoding", which causes the interpreter to abort. I'm confused because http://bugs.python.org/issue8610 (from 2010) seems to have come down on the side of deciding that nl_langinfo() failures should be treated as implicitly returning either "ASCII" or "UTF-8" (I'm not sure which). But maybe that was for a different part of the interpreter? In any case there are 4 choices here, all of which are preferable to what we are doing now. 1. Fail during configure. If we can't even start the interpreter, then why waste the users time with the build? 2. Fail during compilation. The #else path could contain #error "Python only works on systems where nl_langinfo() is correctly implemented." Again, this would be far preferable to failing only once the user has finished the install and tries to get the interpreter prompt. 3. Implement our own python_nl_langinfo() that we fall back on when the system one doesn't exist. (It could, for example, return "ASCII" (or "ANSI_X3.4-1968") to start with, and "UTF-8" after we see a call to setlocale(LC_CTYPE, "") or setlocale(LC_ALL, ""). 4. just return the string "ASCII". The attached patch does the last. I'm willing to try to write the patch for choice (3) if that's what you'd prefer. (I have an implementation that does (3) for systems that also don't have setlocale() implemented, but I don't yet know how to do it if nl_langinfo() doesn't exist but setlocale() does.) |
|
|
msg230111 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2014-10-27 23:20 |
> I'm confused because http://bugs.python.org/issue8610 (from 2010) seems to have come down on the side of deciding that nl_langinfo() failures should be treated as implicitly returning either "ASCII" or "UTF-8" It's very important than Py_DecodeLocale and Py_EncodeLocale use the same encoding than sys.getfilesystemencoding(). What is your platform? Which encoding is used by these functions? |
|
|
msg230385 - (view) |
Author: Matt Frank (WanderingLogic) * |
Date: 2014-10-31 20:36 |
My platform is the Android command-line shell. Essentially it is like an embedded linux platform with a very quirky partially implemented libc (not glibc). It has no langinfo.h and while it has locale.h, the implementations of setlocale() and localeconv() do nothing (and return null). The wcstombs() and mbstowcs() functions are both mapped to strncpy(). As was the original intent of utf-8, since the Linux kernel (and most supported file systems) store filenames as null-terminated byte strings, utf-8 encoded file names "work" with software that assumes that the encoding is utf-8 (for example the xterm program that I'm using to "ssh" into the machine) (for another example, the Dalvik JVM that runs user-apps.) My intent with this tracker is to make it slightly easier for people who have libc like Android where the locale support is completely broken and really only 8-bit "ascii" is supported to get something reasonable to compile and run, while simultaneously not breaking the supported platforms. If you look at what Kivy and Py4A have done, they basically have patches all over the main interpreter that, once applied, make the interpreter not work on any supported platform. I'm trying to avoid that approach. Two possibilities for this particular part of the interpreter are to implement option (3) above, or to implement option (4) above. Option (3) is preferable in the long run, but option(4) is a much smaller change (as long as it does consistently with the decision of tracker 8610.) |
|
|
msg230391 - (view) |
Author: Stefan Krah (skrah) *  |
Date: 2014-10-31 21:29 |
Has anyone made an effort to get this fixed in Android? I find it strange that hundreds of projects now work around Android bugs instead of putting (friendly) pressure on the Android maintainers. Minimal langinfo.h and locale.h support should be trivial to implement. |
|
|
msg230393 - (view) |
Author: Matt Frank (WanderingLogic) * |
Date: 2014-10-31 21:57 |
I am working on using my resources at Intel to put some pressure on Google to fix some of the (many) problems in the Bionic libc. I have a sort of "polyfill" library that implements locale.h, langinfo.h, as well as the structure definitions for wchar.h, and it borrows the utf8 mbs*towcs() and wcs*tombs() implementations from FreeBSD. It implements a setlocale() and nl_langinfo() that starts in locale "C", fakes it as though the user's envvars are set to "C.UTF-8" (so if you call setlocale(LC_ALL, "") the encoding is changed to UTF-8). But Bionic has been broken for many years, and it will most likely take many more years before I (or somebody) can arrange the right set of things to get it fixed. It is not really in Google's interest to have people writing non-JVM code, so they seem to only grudgingly support it, their JVM APIs are the "walled garden" that keeps apps sticky to their platform, while allowing them to quickly switch to new processor architectures if they need to. But all of that is not really germane to this bug. The fact is that cpython, when compiled for a system with no langinfo.h creates an executable that does nothing but crash. What other systems (other than Android) have no langinfo.h? (Alternatively, why has this feature-test been in configure.ac for many years?) If the solution for Android is "it's android's bug and they should fix it" then shouldn't we remove all the #ifdef HAVE_LANGINFO_H tests from the code and just let compilation fail on systems that don't have langinfo.h? That is option (1) or (2) that I suggested above. |
|
|
msg230394 - (view) |
Author: Stefan Krah (skrah) *  |
Date: 2014-10-31 21:57 |
To expand a little, here ... https://code.google.com/p/android/issues/list ... I cannot find either a localeconv() or an nl_langinfo() issue. Perhaps the maintainers would be willing to add minimal versions? |
|
|
msg230407 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2014-10-31 22:39 |
If the platform doesn't provide anything, we can maybe adopt the same approach than Mac OS X: force the encoding to UTF-8 and just don't use the C library. |
|
|
msg264160 - (view) |
Author: Xavier de Gaye (xdegaye) *  |
Date: 2016-04-25 08:03 |
Android default system encoding is UTF-8 as specified at http://developer.android.com/reference/java/nio/charset/Charset.html The platform's default charset is UTF-8. (This is in contrast to some older implementations, where the default charset depended on the user's locale.) > If the platform doesn't provide anything, we can maybe adopt the same > approach than Mac OS X: force the encoding to UTF-8 and just don't use > the C library. The attached patch does the same thing as proposed by Victor but emphasizes that Android does not HAVE_LANGINFO_H and does not have CODESET. And the fact that HAVE_LANGINFO_H and CODESET are not defined causes other problems (maybe as well in Mac OS X). In that case, PyCursesWindow_New() in _cursesmodule.c falls back nicely to "utf-8", but _Py_device_encoding() in fileutils.c instead does a Py_RETURN_NONE. It seems that this impacts _io_TextIOWrapper___init___impl() in textio.c and os_device_encoding_impl() in posixmodule.c. And indeed, os.device_encoding(0) returns None on android. |
|
|
msg264202 - (view) |
Author: Roundup Robot (python-dev)  |
Date: 2016-04-25 23:57 |
New changeset ad6be34ce8c9 by Stefan Krah in branch 'default': Issue #22747: Workaround for systems without langinfo.h. https://hg.python.org/cpython/rev/ad6be34ce8c9 |
|
|
msg264203 - (view) |
Author: Stefan Krah (skrah) *  |
Date: 2016-04-26 00:00 |
We don't support Android officially yet, but I think until #8610 is resolved something must be done here. |
|
|
msg342542 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2019-05-15 02:56 |
Python 3 (I don't recall which version exactly) has been fixed to always use UTF-8 on Android for the filesystem encoding and even for the locale encoding in most places. I close the issue. |
|
|