[Python-Dev] Divorcing str and unicode (no more implicit conversions). (original) (raw)

Neil Hodgson nyamatongwe at gmail.com
Mon Oct 24 05:41:50 CEST 2005


Guido van Rossum:

Folks, please focus on what Python 3000 should do.

I'm thinking about making all character strings Unicode (possibly with different internal representations a la NSString in Apple's Objective C) and introduce a separate mutable bytes array data type. But I could use some validation or feedback on this idea from actual practitioners.

I'd like to more tightly define Unicode strings for Python 3000. Currently, Unicode strings may be implemented with either 2 byte (UCS-2) or 4 byte (UTF-32) elements. Python should allow strings to contain any Unicode character and should be indexable yielding characters rather than half characters. Therefore Python strings should appear to be UTF-32. There could still be multiple implementations (using UTF-16 or UTF-8) to preserve space but all implementations should appear to be the same apart from speed and memory use.

Neil



More information about the Python-Dev mailing list