[Python-3000] string module trimming (original) (raw)
Jim Jewett jimjjewett at gmail.com
Thu Apr 19 21:34:25 CEST 2007
- Previous message: [Python-3000] string module trimming
- Next message: [Python-3000] string module trimming
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On 4/19/07, Jason Orendorff <jason.orendorff at gmail.com> wrote:
On 4/18/07, Jim Jewett <jimjjewett at gmail.com> wrote: > On 4/18/07, Guido van Rossum <guido at python.org> wrote: > > On 4/18/07, Jim Jewett <jimjjewett at gmail.com> wrote: Seriously, a table of alphabets that's saner than string.letters is pretty trivial to write:
alphabets = { 'en': list("ABCDEFGHIJKLMNOPQRSTUVWXYZ"), 'es': ("A B C Ch D E F G H I J K L Ll M " + "N \u00d1 O P Q R S T U V W X Y Z").split(), ... }
I suspect you could do a bit better with properties already in the unicode database -- but offhand, I'm not sure how. If setting locale put the correct one in string.letters for me, that would be great.
Two: Collation.
Collation can be done right: provide a function text.sortkey() that converts a str into an opaque thing that has the desired ordering.
If this function is context-free (depending on only the input string), it will violate the unicode standard (and, apparently, do the wrong thing for some languages -- usually including French).
Also, that key is probably larger than the original string, and they warn against trying to create it in a single pass.
I'm not saying relying strictly on unicode properties can't be done right -- I'm just saying that it would be very difficult and very inefficient, so it probably won't happen soon -- which is an argument for keeping the half-measures around meanwhile.
-jJ
- Previous message: [Python-3000] string module trimming
- Next message: [Python-3000] string module trimming
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]