[Python-Dev] please consider changing --enable-unicode default to ucs4 (original) (raw)
M.-A. Lemburg mal at egenix.com
Mon Sep 28 10:25:45 CEST 2009
- Previous message: [Python-Dev] PEP 389: argparse - new command line parsing module
- Next message: [Python-Dev] please consider changing --enable-unicode default to ucs4
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Zooko O'Whielacronx wrote:
Folks:
I'm sorry, I think I didn't make my concern clear. My users, and lots of other users, are having a problem with incompatibility between Python binary extension modules. One way to improve the situation would be if the Python devs would use their "bully pulpit" -- their unique position as a source respected by all Linux distributions -- and say "We recommend that Linux distributions use UCS4 for compatibility with one another". This would not abrogate anyone's ability to choose their preferred setting nor, as far as I can tell, would it interfere with the ongoing development of Python.
-1
Please note that we did not choose to ship Python as UCS4 binary on Linux - the Linux distributions did.
The Python default is UCS2 for a good reason: it's a good trade-off between memory consumption, functionality and performance.
As already mentioned, I also don't understand how the changing the Python default on Linux would help your users in any way - if you let distutils compile your extensions, it's automatically going to use the right Unicode setting for you (as well as your users).
Unfortunately, this automatic support doesn't help you when shipping e.g. setuptools eggs, but this is a tool problem, not one of Python: setuptools completely ignores the fact that there are two ways to build Python.
I'd suggest you ask the tool maintainers to adjust their tools to support the Python Unicode option.
Here are the details:
I'm the maintainer of several Python packages. I work hard to make it easy for users, even users who don't know anything about Python, to use my software. There have been many pain points in this process and I've spent a lot of time on it for about three years now working on packaging, including the tools such as setuptools and distutils and the new "distribute" tool. Python packaging has been improving during these years -- things are looking up. One of the remaining pain points is that I can distribute binaries of my Python extension modules for Windows or Mac, but if I distribute a binary Python extension module on Linux, then if the user has a different UCS2/UCS4 setting then they won't be able to use the extension module. The current de facto standard for Linux is UCS4 -- it is used by Debian, Ubuntu, Fedora, RHEL, OpenSuSE, etc. etc.. The vast majority of Linux users in practice have UCS4, and most binary Python modules are compiled for UCS4. That means that a few folks will get left out. Those folks, from my experience, are people who built their python executable themselves without specifying an override for the default, and the smaller Linux distributions who insist on doing whatever upstream Python devs recommend instead of doing whatever the other Linux distros are doing. One of the data points that I reported was a Python interpreter that was built locally on an Ubuntu server. Since the person building it didn't know to override the default setting of --enable-unicode, he ended up with a Python interpreter built for UCS2, even though all the Python extension modules shipped by Ubuntu were built with UCS4.
People building their own Python version will usually also build their own extensions, so I don't really believe that the above scenario is very common.
Also note that Python will complain loudly when you try to load a UCS2 extension in a UCS4 build and vice-versa. We've made sure that any extension using the Python Unicode C API has to be built for the same UCS version of Python. This is done by using different names for the C APIs at the C level.
These are not isolated incidents. The following google searches suggest that a number of people spend time trying to figure out why Python extension modules fail on their linux systems:
http://www.google.com/search?q=PyUnicodeUCS4FromUnicode+undefined+symbol http://www.google.com/search?q=+PyUnicodeUCS2FromUnicode+undefined+symbol http://www.google.com/search?q=PyUnicodeUCS2AsDefaultEncodedString+undefined+symbol
Perhaps we should add a FAQ entry for these linker errors (which are caused by the mentioned C API changes to prevent mixing UCS version) ?!
Here's a quick way to determine you Python Unicode build type:
python -c "import sys;print((sys.maxunicode<66000)and'UCS2'or'UCS4')"
Perhaps we should include this info as well as an 32/64-bit indicator and the processor type in the Python startup line:
python
Python 2.6 (r26:66714, Feb 3 2009, 20:49:49, UCS4, 64-bit, x86_64) [GCC 4.3.2 [gcc-4_3-branch revision 141291]] on linux2 Type "help", "copyright", "credits" or "license" for more information.
This would help users find the right binaries to install as extension.
Another data point is the Mandriva Linux distribution. It is probably much smaller than Debian, Ubuntu, or RedHat, but it is still one of the major, well-known distributions. I requested of the Python maintainer for Mandriva, Michael Scherer, that they switch from UCS2 to UCS4 in order to reduce compatibility problems like these. His answer as I understood it was that it is best to follow the recommendations of the upstream Python devs by using the default setting instead of choosing a setting for himself.
Which is IMHO what all Linux distributions should have done.
Distributions should really not be put in charge of upstream coding design decisions.
Regards,
Marc-Andre Lemburg eGenix.com
Professional Python Services directly from the Source (#1, Sep 28 2009)
Python/Zope Consulting and Support ... http://www.egenix.com/ mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
::: Try our new mxODBC.Connect Python Database Interface for free ! ::::
eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/
- Previous message: [Python-Dev] PEP 389: argparse - new command line parsing module
- Next message: [Python-Dev] please consider changing --enable-unicode default to ucs4
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]