[Python-Dev] PEP 393 Summer of Code Project (original) (raw)

Antoine Pitrou solipsis at pitrou.net
Sat Aug 27 02:23:31 CEST 2011


On Sat, 27 Aug 2011 12:17:18 +1200 Greg Ewing <greg.ewing at canterbury.ac.nz> wrote:

Paul Moore wrote:

> IronPython and Jython can retain UTF-16 as their native form if that > makes interop cleaner, but in doing so they need to ensure that basic > operations like indexing and len work in terms of code points, not > code units, if they are to conform. ... They lose the O(1) > guarantee, but that's easily defensible as a tradeoff to conform to > underlying runtime semantics. I would only agree as long as it wasn't too much worse than O(1). O(log n) might be all right, but O(n) would be unacceptable, I think.

It also depends a lot on actual measured performance. As someone mentioned in the tracker, the index you use on a string usually comes from a previous string operation (like a search), perhaps with a small offset. So a caching scheme may actually give very good results with a rather small overhead (you could cache, say, the 4 most recent indices and choose the nearest when an indexing operation is done; with utf-8, scanning backward and forward is equally simple).

Regards

Antoine.



More information about the Python-Dev mailing list