[Python-Dev] Divorcing str and unicode (no more implicit conversions). (original) (raw)
Guido van Rossum guido at python.org
Mon Oct 24 23:31:18 CEST 2005
- Previous message: [Python-Dev] Divorcing str and unicode (no more implicit conversions).
- Next message: [Python-Dev] Divorcing str and unicode (no more implicit conversions).
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On 10/24/05, "Martin v. Löwis" <martin at v.loewis.de> wrote:
Indeed. My guess is that indexing is more common than you think, especially when iterating over the string. Of course, iteration could also operate on UTF-8, if you introduced string iterator objects.
Python's slice-and-dice model pretty much ensures that indexing is common. Almost everything is ultimately represented as indices: regex search results have the index in the API, find()/index() return indices, many operations take a start and/or end index. As long as that's the case, indexing better be fast.
Changing the APIs would be much work, although perhaps not impossible of Python 3000. For example, Raymond Hettinger's partition() API doesn't refer to indices at all, and can replace many uses of find() or index().
Still, the mere existence of getitem and getslice on strings makes it necessary to implement them efficiently. How realistic would it be to drop them? What should replace them? Some kind of abstract pointers-into-strings perhaps, but that seems much more complex.
The trick seems to be to support both simple programs manipulating short strings (where indexing is probably the easiest API to understand, and the additional copying is unlikely to cause performance problems) , as well as programs manipulating very large buffers containing text and doing sophisticated string processing on them. Perhaps we could provide a different kind of API to support the latter, perhaps based on a mutable character buffer data type without direct indexing?
-- --Guido van Rossum (home page: http://www.python.org/~guido/)
- Previous message: [Python-Dev] Divorcing str and unicode (no more implicit conversions).
- Next message: [Python-Dev] Divorcing str and unicode (no more implicit conversions).
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]