[Python-Dev] PEP 393 Summer of Code Project (original) (raw)
Antoine Pitrou solipsis at pitrou.net
Sat Aug 27 02:23:31 CEST 2011
- Previous message: [Python-Dev] PEP 393 Summer of Code Project
- Next message: [Python-Dev] PEP 393 Summer of Code Project
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Sat, 27 Aug 2011 12:17:18 +1200 Greg Ewing <greg.ewing at canterbury.ac.nz> wrote:
Paul Moore wrote:
> IronPython and Jython can retain UTF-16 as their native form if that > makes interop cleaner, but in doing so they need to ensure that basic > operations like indexing and len work in terms of code points, not > code units, if they are to conform. ... They lose the O(1) > guarantee, but that's easily defensible as a tradeoff to conform to > underlying runtime semantics. I would only agree as long as it wasn't too much worse than O(1). O(log n) might be all right, but O(n) would be unacceptable, I think.
It also depends a lot on actual measured performance. As someone mentioned in the tracker, the index you use on a string usually comes from a previous string operation (like a search), perhaps with a small offset. So a caching scheme may actually give very good results with a rather small overhead (you could cache, say, the 4 most recent indices and choose the nearest when an indexing operation is done; with utf-8, scanning backward and forward is equally simple).
Regards
Antoine.
- Previous message: [Python-Dev] PEP 393 Summer of Code Project
- Next message: [Python-Dev] PEP 393 Summer of Code Project
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]