[Python-Dev] Alternative implementation of string interning

Tim Peters tim.one@comcast.net
Mon, 01 Jul 2002 17:12:31 -0400


[Oren Tirosh, on <http://python.org/sf/576101>]
> ...
> Interned strings are no longer immortal.  They die when their refcnt
> reaches 0 just like any other object.

This may be a problem.  Code now can rely on that id(some_interned_string)
stays the same across the life of a run.

> ...
> Can anyone explain why they were implemented with a pointer in the first
> place? Barry?

It will have to be Guido.  He made a plausible case to me once about why the
indirection is there, but it may be an optimization that's no longer
important.  At the time interned strings were introduced, extension modules
had mountains of code of the form:

    /* at module init time, in one or more modules */
    static PyObject *spam_str = PyString_FromString("spam");

    /* in various module routines */
    PyObject_SetAttr(someobject, spam_str, user_supplied_value);

and PyObject_SetAttr() was changed to make spam_str what you called an
"indirectly interned" string by magic.  This was (or at least Guido thought
it was <wink>) an important optimization at the time.

Extension modules written after interned strings were introduced can exploit
interning directly, a la

    /* at module init time, in one or more modules */
    static PyObject *spam_str = PyString_InternFromString("spam");

and the core was reworked to do that too (note that this optimization wasn't
directed at the core -- it could well be that core code never creates an
indirectly interned string).  I don't know how many extension modules still
implicitly rely on indirect interning for a speed boost.  Zope doesn't, and
that's all that really matters <wink>.