[Python-Dev] please consider changing --enable-unicode default to ucs4 (original) (raw)
Zooko O'Whielacronx zookog at gmail.com
Tue Sep 29 19:03:25 CEST 2009
- Previous message: [Python-Dev] please consider changing --enable-unicode default to ucs4
- Next message: [Python-Dev] PEP 3144 review.
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Dear MAL and python-dev:
I failed to explain the problem that users are having. I will try again, and this time I will omit my ideas about how to improve things and just focus on describing the problem.
Some users are having trouble using Python packages containing binary extensions on Linux. I want to provide such binary Python packages for Linux for the pycryptopp project (http://allmydata.org/trac/pycryptopp ) and the zfec project (http://allmydata.org/trac/zfec ). I also want to make it possible for users to install the Tahoe-LAFS project (http://allmydata.org ) without having a compiler or Python header files. (You'd be surprised at how often Tahoe-LAFS users try to do this on Linux. Linux is no longer only for people who have the knowledge and patience to compile software themselves.) Tahoe-LAFS also depends on many packages that are maintained by other people and are not packaged or distributed by me -- pyOpenSSL, simplejson, etc..
There have been several hurdles in the way that we've overcome, and no doubt there will be more, but the current hurdle is that there are two "formats" for Python extension modules that are used on Linux -- UCS2 and UCS4. If a user gets a Python package containing a compiled extension module which was built for the wrong UCS2/4 setting, he will get mysterious (to him) "undefined symbol" errors at import time.
On Mon, Sep 28, 2009 at 2:25 AM, M.-A. Lemburg <mal at egenix.com> wrote:
The Python default is UCS2 for a good reason: it's a good trade-off between memory consumption, functionality and performance.
I'm sure you are right about this. At some point I will try to measure the performance implications in the context of our application. I don't think it will be an issue for us, as so far no users have complained about any performance or functionality problems that were traceable to the choice of UCS2/4.
As already mentioned, I also don't understand how the changing the Python default on Linux would help your users in any way - if you let distutils compile your extensions, it's automatically going to use the right Unicode setting for you (as well as your users).
My users are using some Python packages built by me and some built by others. The binary packages they get from others could have the incompatible UCS2/4 setting. Also some of my users might be using a python configured with the opposite setting of the python interpreter that I use to build packages.
Unfortunately, this automatic support doesn't help you when shipping e.g. setuptools eggs, but this is a tool problem, not one of Python: setuptools completely ignores the fact that there are two ways to build Python.
This is the setuptools/distribute issue that I mentioned: http://bugs.python.org/setuptools/issue78 . If that issue were solved then if a user tried to install a specific package, for example with a command-line like "easy_install http://allmydata.org/source/tahoe/deps/tahoe-dep-eggs/pyOpenSSL-0.8-py2.5-linux-i686.egg", then instead of getting an undefined symbol error at import time, they would get an error message to the effect of "This package is not compatible with your Python interpreter." at install time. That would be good because it would be less confusing to the users.
However, if they were using the default setuptools/distribute dependency-satisfaction feature, e.g. because they are installing a package and that package is marked as "install_requires=['pyOpenSSL']", then setuptools/distribute would do its fallback behavior in which it attempts to compile the package from source when it can't find a compatible binary package. This would probably confuse the users at least as much as the undefined symbol error currently does.
In any case, improving the tools to handle incompatible packages nicely would not make more packages compatible. Let's do both! Improve tools to handle incompatible packages nicely, and encourage everyone who compiles python on Linux to use the same UCS2/4 setting.
Thank you for your attention.
Regards,
Zooko
- Previous message: [Python-Dev] please consider changing --enable-unicode default to ucs4
- Next message: [Python-Dev] PEP 3144 review.
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]