[Python-Dev] please consider changing --enable-unicode default to ucs4 (original) (raw)
M.-A. Lemburg mal at egenix.com
Wed Oct 7 20:05:27 CEST 2009
- Previous message: [Python-Dev] Python 2.6.4rc1
- Next message: [Python-Dev] please consider changing --enable-unicode default to ucs4
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Zooko O'Whielacronx wrote:
Dear MAL and python-dev:
I failed to explain the problem that users are having. I will try again, and this time I will omit my ideas about how to improve things and just focus on describing the problem. Some users are having trouble using Python packages containing binary extensions on Linux. I want to provide such binary Python packages for Linux for the pycryptopp project (http://allmydata.org/trac/pycryptopp ) and the zfec project (http://allmydata.org/trac/zfec ). I also want to make it possible for users to install the Tahoe-LAFS project (http://allmydata.org ) without having a compiler or Python header files. (You'd be surprised at how often Tahoe-LAFS users try to do this on Linux. Linux is no longer only for people who have the knowledge and patience to compile software themselves.) Tahoe-LAFS also depends on many packages that are maintained by other people and are not packaged or distributed by me -- pyOpenSSL, simplejson, etc.. There have been several hurdles in the way that we've overcome, and no doubt there will be more, but the current hurdle is that there are two "formats" for Python extension modules that are used on Linux -- UCS2 and UCS4. If a user gets a Python package containing a compiled extension module which was built for the wrong UCS2/4 setting, he will get mysterious (to him) "undefined symbol" errors at import time.
Zooko, I really fail to see the reasoning here:
Why would people who know how to build their own Python interpreter on Linux and expect it to work like the distribution-provided one, have a problem looking up the distribution-used configuration settings ?
This is like compiling your own Linux kernel without using the same configuration as the distribution kernel and still expecting the distribution kernel modules to load without problems.
Note that this has nothing to do with compiling your own Python extensions. Python's distutils will automatically use the right settings for compiling those, based on the configuration of the Python interpreter used for running the compilation - which will usually be the distribution interpreter.
Your argument doesn't really live up to the consequences of switching to UCS4.
Just as data-point: eGenix has been shipping binaries for Python packages for several years and while we do occasionally get reports about UCS2/UCS4 mismatches, those are really in the minority.
I'd also question using the UCS4 default only on Linux.
If we do go for a change, we should use sizeof(wchar_t) as basis for the new default - on all platforms that provide a wchar_t type.
However, before we can make such a decision, we need more data about the consequences. That is:
memory footprint changes
performance changes
For both Python 2.x and 3.x. After all, UCS4 uses twice as much memory for all Unicode objects as UCS2.
Since Python 3.x uses Unicode for all strings, I'd expect such a change to have more impact there.
We'd also need to look into possible problems with different compilers using different wchar_t sizes on the same platform (I doubt that there are any).
On Windows, the default is fixed since Windows uses UTF-16 for everything Unicode, so UCS2 will for a long time be the only option on that platform.
That said, it'll take a while for distributions to upgrade, so you're always better off getting the tools you're using to deal with the problem for you and your users, since those are easier to upgrade.
-- Marc-Andre Lemburg eGenix.com
Professional Python Services directly from the Source (#1, Oct 07 2009)
Python/Zope Consulting and Support ... http://www.egenix.com/ mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
::: Try our new mxODBC.Connect Python Database Interface for free ! ::::
eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/
- Previous message: [Python-Dev] Python 2.6.4rc1
- Next message: [Python-Dev] please consider changing --enable-unicode default to ucs4
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]