[Python-3000] string module trimming (original) (raw)
Jason Orendorff jason.orendorff at gmail.com
Fri Apr 20 17:10:54 CEST 2007
- Previous message: [Python-3000] string module trimming
- Next message: [Python-3000] string module trimming
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On 4/19/07, Jim Jewett <jimjjewett at gmail.com> wrote:
On 4/19/07, Jason Orendorff <jason.orendorff at gmail.com> wrote: > Collation can be done right: provide a function text.sortkey() > that converts a str into an opaque thing that has the desired > ordering.
If this function is context-free (depending on only the input string), it will violate the unicode standard (and, apparently, do the wrong thing for some languages -- usually including French).
I meant this to be a function of the string and the locale,[*] implemented as a thin wrapper around wcsxfrm() on posix, LCMapString() on Win32, Collator.getCollationKey() in Java, CompareInfo.GetSortKey() in .NET. Whether these are Unicode-compliant is out of our hands.
We're not Sun or IBM. I don't think we're going to implement and maintain this ourselves. So I see two options: (1) swallow a hard dependency on a particular implementation, maybe ICU; (2) use whatever the system happens to provide. Either one is fine with me.
I'm not saying relying strictly on unicode properties can't be done right -- I'm just saying that it would be very difficult and very inefficient, so it probably won't happen soon -- which is an argument for keeping the half-measures around meanwhile.
This would be true if the only option were to implement it all ourselves.
-j
[*] I would prefer a context-free function really takes both the string and the locale as arguments... but the posix API doesn't support that. :-P
- Previous message: [Python-3000] string module trimming
- Next message: [Python-3000] string module trimming
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]