[Python-Dev] Divorcing str and unicode (no more implicit conversions). (original) (raw)

Antoine Pitrou solipsis at pitrou.net
Mon Oct 3 15:26:55 CEST 2005


Le lundi 03 octobre 2005 à 14:59 +0200, Fredrik Lundh a écrit :

Antoine Pitrou wrote:

> A good rule of thumb is to convert to unicode everything that is > semantically textual and isn't pure ASCII.

How can you be sure that something that is /semantically textual/ will always remain "pure ASCII" ? That's contradictory, unless your software never goes out of the anglo-saxon world (and even...).

(anyone who are tempted to argue otherwise should benchmark their applications, both speed- and memorywise, and be prepared to come up with very strong arguments for why python programs shouldn't be allowed to be fast and memory-efficient whenever they can...)

I think most applications don't critically depend on text processing performance. OTOH, international adaptability is the kind of thing that /will/ bite you one day if you don't prepare for it at the beginning.

Also, if necessary, the distinction could be an implementation detail and the conversion be transparent (like int vs. long): the text would be coded in an 8-bit charset as long as possible and converted to a wide encoding only when necessary. The important thing is that these optimisations, if they are necessary, should be transparently handled by the Python runtime.

(it seems to me - I may be mistaken - that modern Windows versions treat every string as 16-bit unicode internally. Why are they doing it if it is that inefficient?)

Regards

Antoine.



More information about the Python-Dev mailing list