[Python-Dev] Re: Re: Alternative Implementation for PEP 292: SimpleString Substitutions (original) (raw)

Fredrik Lundh fredrik at pythonware.com
Mon Sep 13 08:53:53 CEST 2004


Stephen J. Turnbull wrote:

But I worry that it's an exceptional example, when you use assumptions like "real-life text uses characters drawn from a small number of short contiguous regions in the alphabet."

The problem is that I cannot tell if you've studied search issues, or if you're just applying general "but wait, it's different for asian languages" arguments here.

There are many issues here, all pulling in different directions:

The only way to know for sure is if anyone has the time and energy to carry out tests on real-life datasets. (or at least prepare some datasets; I can run the tests if someone provides me with a file with search terms and a number of files containing texts to apply them to, preferrably using UTF-8 encoding).



More information about the Python-Dev mailing list