Why don't strings share data in Python?

Skip Montanaro skip at pobox.com
Wed Apr 24 15:10:18 EDT 2002


    CG> If you mean, why don't substrings share space with parent strings,
    CG> hmm, that's an interesting question.  I can see that a string object
    CG> implemented something like this struct:

    CG>   struct ExampleString {
    CG>     int len;
    CG>     char *buf;
    CG>   }

    CG> would allow that, and it sounds like an interesting optimization.

It probably wouldn't turn out to be an optimization at all.  It would slow
down string creation in many situations and make passing strings between
Python's C implementation and external C functions much slower.  Suppose I
had

    s = "abcdefg"
    t = s[2:5]

In theory, with a data structure like above, t->buf could just refer to
&(s->buf[2]).  Unfortunately, t->buf would not be NULL-terminated.  To pass
it to most C lib functions you'd first have to strncpy() it.  The current
implementation, while it wastes some storage, exchanges string data with
external functions without copying.

String object creation uses a variable length object header.  This works
because strings are immutable.  The bytes for the raw string are allocated
along with the object header itself.  This saves a malloc() call for each
string allocation, which is quite noticeable because Python allocates so
many string objects.

-- 
Skip Montanaro (skip at pobox.com - http://www.mojam.com/)





More information about the Python-list mailing list