[Python-Dev] Internal representation of strings and Micropython (original) (raw)
Greg Ewing greg.ewing at canterbury.ac.nz
Thu Jun 5 02:03:17 CEST 2014
- Previous message: [Python-Dev] Internal representation of strings and Micropython
- Next message: [Python-Dev] Internal representation of strings and Micropython
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Serhiy Storchaka wrote:
html.HTMLParser, json.JSONDecoder, re.compile, tokenize.tokenize don't use iterators. They use indices, str.find and/or regular expressions. Common use case is quickly find substring starting from current position using str.find or re.search, process found token, advance position and repeat.
For that kind of thing, you don't need an actual character index, just some way of referring to a place in a string.
Instead of an integer, str.find() etc. could return a StringPosition, which would be an opaque reference to a particular point in a particular string. You would be able to pass StringPositions to indexing and slicing operations to get fast indexing into the string that they were derived from.
StringPositions could support the following operations:
StringPosition + int --> StringPosition
StringPosition - int --> StringPosition
StringPosition - StringPosition --> int
These would be computed by counting characters forwards or backwards in the string, which would be slower than int arithmetic but still faster than counting from the beginning of the string every time.
In other contexts, StringPositions would coerce to ints (maybe being an int subclass?) allowing them to be used in any existing algorithm that slices strings using ints.
-- Greg
- Previous message: [Python-Dev] Internal representation of strings and Micropython
- Next message: [Python-Dev] Internal representation of strings and Micropython
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]