[Python-3000] characters data type

Michael Chermside mcherm at mcherm.com
Wed May 3 14:46:32 CEST 2006


Michael Chermside wrote:
    [on implementing str slices as views]
> > Read the Java "String" class to see
> > just how easy. The slice returns a subclass of str that stores a
> > start and stop position but redirects data access to the buffer used
> > by the original str. The only tricky part is to manage garbage
> > collection, solved by having the slice object contain a reference to
> > the original str.

Fredrik Lundh replies:
> you missed the part about slicing slices

Yes. Sorry. Seemed obvious. Slicing a StringSlice should create a new
StringSlice instance with different start and stop values and another
reference to the original str.

> and the bit about what heuristics
> to use (if any) to use slicing under the hood also for non-explicit
> slicing operations (e.g. should s[:-1] really make a copy? [...] etc.

Um... is there a reason for heuristics? What's wrong with always
returning a StringSlice? (see below for one answer) Having s[i]
return a slice instead of a separate one-character

> when can s[i] return a slice?

Why should it ever? Single-character strings tend to be re-used, and
sharing a buffer will save only a few bytes, so there's little reason
not to return a truly separate string.

>   some kind of "temporary" indicator provided by the com-
piler would be excellent for under-the-hood optimizations like this...).

Why?


Any slice with a stride should NOT try to share buffer, primarily
because we want a simple and efficient implementation. This approach
does suffer from one major problem: it has a tendency to keep around
large strings that could be garbarge collected except that we still
have a reference to some small substring. Clever programmers could
avoid this (eg: use bigString[x:y:1] to avoid keeping the reference),
but it would be a new possible minefield, and (so-called) "memory
leak" for Python programs. I see that as a valid criticism. But I
don't understand why it needs to be complex, and I point to Java's
String class[1] as an existance proof.

-- Michael Chermside


[1] http://www.javaresearch.org/source/jdk142/java/lang/String.java.html
   particularly around line 436.




More information about the Python-3000 mailing list