[Python-3000] How will unicode get used? (original) (raw)

Adam Olsen rhamph at gmail.com
Wed Sep 20 14:55:56 CEST 2006

Before we can decide on the internal representation of our unicode objects, we need to decide on their external interface. My thoughts so far:

Most transformation and testing methods (.lower(), .islower(), etc) can be copied directly from 2.x. They require no special implementation to perform reasonably.
Indexing and slicing is the big issue. Do we need constant-time integer slicing? .find() could be changed to return a token that could be used as a constant-time offset. Incrementing the token would have linear costs, but that's no big deal if the offsets are always small.
Grapheme clusters, words, lines, other groupings, do we need/want ways to slice based on them too?
Cheap slicing and concatenation (between O(1) and O(log(n))), do we want to support them? Now would be the time.

-- Adam Olsen, aka Rhamphoryncus