[Python-Dev] Divorcing str and unicode (no more implicit conversions). (original) (raw)

"Martin v. Löwis" martin at v.loewis.de
Tue Oct 25 00:21:06 CEST 2005


Guido van Rossum wrote:

Changing the APIs would be much work, although perhaps not impossible of Python 3000. For example, Raymond Hettinger's partition() API doesn't refer to indices at all, and can replace many uses of find() or index().

I think Neil's proposal is not to make them go away, but to implement them less efficiently. For example, if the internal representation is UTF-8, indexing requires linear time, as opposed to constant time. If the internal representation is UTF-16, and you have a flag to indicate whether there are any surrogates on the string, indexing is constant if the flag is false, else linear.

Perhaps we could provide a different kind of API to support the latter, perhaps based on a mutable character buffer data type without direct indexing?

There are different design goals conflicting here:

It's not so much a matter of API as a matter of internal representation. The API doesn't have to change (except for the very low-level C API that directly exposes Py_UNICODE*, perhaps).

Regards, Martin



More information about the Python-Dev mailing list