Internals of interning strings

Michael Hudson mwh21 at cam.ac.uk
Fri Mar 24 03:43:42 EST 2000


"Jason Stokes" <jstok at bluedog.apana.org.au> writes:
[schnipp]
> 
> On the next call to intern, the string referred to by the name "d" has the
> same value as that referred to by "a".  When we go to intern it, we find the
> value of the string referred to by d is already present in the dictionary.
> The string referred to by a is returned as the result of the function.
> Also, the interpreter sets the internal field "ob_sinterned" of the object
> referred to by d to *also* point to a.  Now, anywhere the object referred to
> by d is used certain operations can be slightly optimized.  If you invoke
> intern on the object referred to by d again, the PyString_InternInPlace
> routine sees that its "ob_sinterned" field already points to an object, and
> returns that, instead of looking it up in the dictionary again.  And if "d"
> is hashed, the hash function returns the cached hash value of the object
> currently pointed to by "a".

Yup, I think that's right.

> I don't know if that's clear, but I didn't want to include the whole source
> listing.  Anyway, the question is: is this the only reason for the extra
> entry "ob_sinterned" in the PyString struct?  That is, a couple of
> optimisations, costing an extra 4 bytes per string object?

What I think you're missing is that the `intern' builtin is an
interface to what is essentially an *internal* optimisation strategy.
It's there mainly to optimise the lookup of strings in dictionaries -
because if two strings are interned, then testing for equality is just
a pointer comparision.  The compiler automatically interns likely
looking strings, so when executing 

    self.foobar = self.foobar + 1

the string "foobar" is already interned and so the lookups of it in
self.__dict__ are quicker than otherwise.

At least, that's my understanding of the situation.

You can build Python without INTERN_STRINGS to see how the space/time
behaviour changes, but I doubt you'll like the results.

Cheers,
M.

-- 
very few people approach me in real life and insist on proving they are
drooling idiots.                         -- Erik Naggum, comp.lang.lisp



More information about the Python-list mailing list