[Python-3000] characters data type

Guido van Rossum guido at python.org
Wed May 3 18:43:55 CEST 2006


On 5/3/06, Michael Chermside <mcherm at mcherm.com> wrote:
> Guido writes:
> > (I had a bad
> > experience in my youth with strings implemented as trees, so I'm
> > biased against complicated string implementations. This also explains
> > why I'm no fan of the oft-proposed idea that slices should avoid
> > making physical copies even if they make logical copies -- the
> > complexity of that approach horrifies me.)
>
> No argument here with regard to strings implemented as trees, but I
> think you may be needlessly worried about physical vs logical copies
> for slices. Since strings (and slices of strings) are immutable, the
> implementation is quite simple. Read the Java "String" class to see
> just how easy. The slice returns a subclass of str that stores a
> start and stop position but redirects data access to the buffer used
> by the original str. The only tricky part is to manage garbage
> collection, solved by having the slice object contain a reference to
> the original str. That's it.
>
> Of course, you knew that, but the fact that I can describe it fully
> in 2 sentences should help show it's not overly complex.

The problem is that it's suboptimal if you read a 10 MB file into
memory (that's small here at Google :-) and split it into smaller
strings. If you keep only a small fraction of the substrings, the 10
MB string is just clogging up memory. So there will be pressure to
complicate the scheme by adding heuristics about when to copy and when
to share, etc., etc.

--
--Guido van Rossum (home page: http://www.python.org/~guido/)


More information about the Python-3000 mailing list