[Python-Dev] PEP 393 Summer of Code Project

Sat Aug 27 02:23:31 CEST 2011

On Sat, 27 Aug 2011 12:17:18 +1200
Greg Ewing <greg.ewing at canterbury.ac.nz> wrote:
> Paul Moore wrote:
> 
> > IronPython and Jython can retain UTF-16 as their native form if that
> > makes interop cleaner, but in doing so they need to ensure that basic
> > operations like indexing and len work in terms of code points, not
> > code units, if they are to conform. ... They lose the O(1)
> > guarantee, but that's easily defensible as a tradeoff to conform to
> > underlying runtime semantics.
> 
> I would only agree as long as it wasn't too much worse
> than O(1). O(log n) might be all right, but O(n) would be
> unacceptable, I think.

It also depends a lot on *actual* measured performance. As someone
mentioned in the tracker, the index you use on a string usually comes
from a previous string operation (like a search), perhaps with a small
offset. So a caching scheme may actually give very good results with a
rather small overhead (you could cache, say, the 4 most recent indices
and choose the nearest when an indexing operation is done; with utf-8,
scanning backward and forward is equally simple).

Regards

Antoine.