[Python-Dev] please consider changing --enable-unicode default to ucs4 (original) (raw)
M.-A. Lemburg mal at egenix.com
Wed Oct 7 23:21:09 CEST 2009
- Previous message: [Python-Dev] please consider changing --enable-unicode default to ucs4
- Next message: [Python-Dev] please consider changing --enable-unicode default to ucs4
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Ronald Oussoren wrote:
On 7 Oct, 2009, at 22:13, M.-A. Lemburg wrote:
Ronald Oussoren wrote:
On 7 Oct, 2009, at 20:05, M.-A. Lemburg wrote:
If we do go for a change, we should use sizeof(wchart) as basis for the new default - on all platforms that provide a wchart type. I'd be -1 on that. Sizeof(wchart) is 4 on OSX, but all non-Unix API's that deal with Unicode text use ucs16. Is that true for non-Carbon APIs as well ? This is what I found on the web (in summary): Apple chose to go with UTF-16 at about the same time as Microsoft did and used sizeof(wchart) == 2 for Mac OS. When they moved to Mac OS X, they switched wchart to sizeof(wchart) == 4. Both Carbon and the modern APIs use UTF-16.
Thanks for that data point. So UTF-16 would be the more natural choice on Mac OS X, despite the choice of sizeof(wchar_t).
What I don't quite get in the UTF-16 vs. UTF-32 discussion is why UTF-32 would be useful, because if you want to do generic Unicode processing you have to look at sequences of composed characters (base characters + composing marks) anyway instead of separate code points. Not that I'm a unicode expert in any way...
Very true.
It's one of the reasons why I'm not much of a UCS4-fan - it only helps with surrogates and that's about it.
Combining characters, various types of control code points (e.g. joiners, bidirectional marks, breaks, non-breaks, annotations) context sensitive casing, bidirectional marks and other such features found in scripts cause very similar problems - often much harder to solve, since they are not as easily identifiable as surrogate high and low code points.
-- Marc-Andre Lemburg eGenix.com
Professional Python Services directly from the Source (#1, Oct 07 2009)
Python/Zope Consulting and Support ... http://www.egenix.com/ mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
::: Try our new mxODBC.Connect Python Database Interface for free ! ::::
eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/
- Previous message: [Python-Dev] please consider changing --enable-unicode default to ucs4
- Next message: [Python-Dev] please consider changing --enable-unicode default to ucs4
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]