[Python-Dev] Divorcing str and unicode (no more implicit conversions). (original) (raw)

"Martin v. Löwis" martin at v.loewis.de
Tue Oct 25 00:21:06 CEST 2005

Previous message: [Python-Dev] Divorcing str and unicode (no more implicit conversions).
Next message: [Python-Dev] Divorcing str and unicode (no more implicit conversions).
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Guido van Rossum wrote:

Changing the APIs would be much work, although perhaps not impossible of Python 3000. For example, Raymond Hettinger's partition() API doesn't refer to indices at all, and can replace many uses of find() or index().

I think Neil's proposal is not to make them go away, but to implement them less efficiently. For example, if the internal representation is UTF-8, indexing requires linear time, as opposed to constant time. If the internal representation is UTF-16, and you have a flag to indicate whether there are any surrogates on the string, indexing is constant if the flag is false, else linear.

Perhaps we could provide a different kind of API to support the latter, perhaps based on a mutable character buffer data type without direct indexing?

There are different design goals conflicting here:

some think: "all my data is ASCII, so I want to only use one byte per character".
others think: "all my data goes to the Windows API, so I want to use 2 byte per character".
yet others think: "I want all of Unicode, with proper, efficient indexing, so I want four bytes per char".

It's not so much a matter of API as a matter of internal representation. The API doesn't have to change (except for the very low-level C API that directly exposes Py_UNICODE*, perhaps).

Regards, Martin

Previous message: [Python-Dev] Divorcing str and unicode (no more implicit conversions).
Next message: [Python-Dev] Divorcing str and unicode (no more implicit conversions).
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the Python-Dev mailing list