[Python-3000] Making more effective use of slice objects in Py3k

Fri Sep 1 08:46:23 CEST 2006

Guido van Rossum wrote:

> A way to handle UTF-8 strings and other variable-length encodings
> would be to maintain a small cache of index positions with the string
> object.

I think just delaying decoding would take us most of the way.  the big 
advantage of storage polymorphism is that you can avoid decoding and 
encoding (and having to pay for the cycles and bytes needed for that) if 
you don't do have to.  the XML case you mentioned is a typical example; 
just compare the behaviour of a library that does some extra work to 
keep things small under the hood with more straightforward implementations:

     http://effbot.org/zone/celementtree.htm#benchmarks

(cElementTree uses the "8-bit ascii mixes well with unicode" approach)

there are plenty of optimizations you can do when accessing the 
beginning and end of a string (startswith, endswith, comparisions, 
slicing, etc), but I think we can deal with that when we get there.
I think the NFS sprint showed that you get better results by working 
with real use cases, rather than spending that theorizing.  it also 
showed that the bottlenecks aren't always where you think they are.

</F>