[Python-Dev] Re: Re: Alternative Implementation for PEP 292:SimpleString Substitutions (original) (raw)

James Y Knight foom at fuhm.net
Tue Sep 14 20:12:35 CEST 2004


On Sep 14, 2004, at 2:54 AM, Terry Reedy wrote:

This is why I am not especially enamored of Unicode and the prospect of Python becoming married to it. It is heavily weighted in favor of efficiently representing Chinese and inefficiently representing English. To give English equivalent treatment, the 20,000 or so most common words, roots, prefixes, and suffixes would each get its own codepoint.

Of course it is perfectly possible to have the Python unicode implementation choose to represent some unicode strings with only 8 bits per character. There is no (conceptual) reason it could not represent (u'a' * 8) with 8 bytes + class header overhead. That is simply an implementation detail and really has nothing to do with Unicode itself.

It would also be possible to use UTF-8 string storage, although this has the tradeoff that indexing an element takes linear time w.r.t. position instead of constant time.

James



More information about the Python-Dev mailing list