[Python-3000] string module trimming (original) (raw)
Jim Jewett jimjjewett at gmail.com
Thu Apr 19 01:08:59 CEST 2007
- Previous message: [Python-3000] string module trimming
- Next message: [Python-3000] string module trimming
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On 4/18/07, Guido van Rossum <guido at python.org> wrote:
On 4/18/07, Jim Jewett <jimjjewett at gmail.com> wrote:
> Today, string.letters works most easily with ASCII supersets, and is > effectively limited to 8-bit encodings. Once everything is unicode, I > don't think that 8-bit restriction should apply any more.
But we already went over this. There are over 40K letters in Unicode. It simply makes no sense to have a string.letters approaching that size.
Agreed. But there aren't 40K (alphabetic) letters in any particular locale. Most individual languages will have less than 100.
As a proxy for measuring "local" characters, I'll note that during some optimization drives for Pango (e.g., http://primates.ximian.com/~federico/news-2005-11.html#04 ) it turned out that there were only two non C-J-K languages that needed more than 256 cache positions in their character glyph tables.
> Unless I missed it (and I may have), unicode itself sort of ducks the > question about how to sort strings. Python really needs to provide > an answer, but I'm not sure it is possible to provide the (single) > correct answer.
The Unicode standard certainly has a solution, but it is complicated and I don't believe it is currently implemented in core Python.
I guess you're right; I saw too many alternatives the last time I looked, and must have stopped reading http://unicode.org/reports/tr10/ after section 1, where it becomes obvious that there is no context-free right answer.
> string.letters is one workaround, and I don't think we should remove > it until a better solution (or workaround) is available.
I disagree. The correct solution is to implement the Unicode support for locale-specific sorting.
And set-inclusion.
I'm not convinced that waiting for such a heavyweight solution is really the best choice, particularly since the spec itself warns against using the strictest forms (too inefficient).
Remember that the locale module supports only a single, global locale at a time. This renders it totally useless in many apps requiring locale support (such as web servers).
Fair enough.
-jJ
- Previous message: [Python-3000] string module trimming
- Next message: [Python-3000] string module trimming
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]