Internals of interning strings

Emile van Sebille emile at fenx.com
Fri Mar 24 09:08:56 EST 2000


I hadn't done anything with 'intern' before, so I tried your
example.  Of course, to save a moment, I entered:

Python 1.5.2 (#0, Apr 13 1999, 10:51:12) [MSC 32 bit (Intel)] on win32
Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam
>>> a = 'testing'
>>> b = a
>>> c = intern(a)
>>> d = 'testing'
>>> e = intern(d)
>>> a is b
1
>>> b is c
1
>>> a is e
1
>>> e is d
1
>>>

This is inconsistent with your results.  In light of Mike Hudson's
response, might optimizations be happening differently on long vs
short strings?  Or is something else going on here?


Emile van Sebille
emile at fenx.com
-------------------


----- Original Message -----
From: Jason Stokes <jstok at bluedog.apana.org.au>
Newsgroups: comp.lang.python
To: <python-list at python.org>
Sent: Thursday, March 23, 2000 8:36 PM
Subject: Internals of interning strings


> If I do the following:
>
> >>> a = "A completely new string that I haven't used before"
> >>> b = a
> >>> c = intern(a)
> >>> d = "A completely new string that I haven't used before"
> >>> e = intern(d)
> >>> a is b
> 1
> >>> b is c
> 1
> >>>a is e
> 1
> >>>e is d
> 0
>
> >From reading the sources, I know the interpreter does the following:
>
> Internally, calls PyString_InternInPlace(PyObject** p).  PyObject** p
is an
> out parameter that is set to the pointer of an interned string.  In
the
> first call to "intern", the string referred to by the name "a" hasn't
been
> interned before, so it's placed in the "interned" dictionary and
itself
> returned as the result of "intern".  The string object referred to by
"a"
> has, internally, an ob_sinterned field.  This is set to point to
itself,
> indicating that it is an interned value.
>
> On the next call to intern, the string referred to by the name "d" has
the
> same value as that referred to by "a".  When we go to intern it, we
find the
> value of the string referred to by d is already present in the
dictionary.
> The string referred to by a is returned as the result of the function.
> Also, the interpreter sets the internal field "ob_sinterned" of the
object
> referred to by d to *also* point to a.  Now, anywhere the object
referred to
> by d is used certain operations can be slightly optimized.  If you
invoke
> intern on the object referred to by d again, the
PyString_InternInPlace
> routine sees that its "ob_sinterned" field already points to an
object, and
> returns that, instead of looking it up in the dictionary again.  And
if "d"
> is hashed, the hash function returns the cached hash value of the
object
> currently pointed to by "a".
>
> I don't know if that's clear, but I didn't want to include the whole
source
> listing.  Anyway, the question is: is this the only reason for the
extra
> entry "ob_sinterned" in the PyString struct?  That is, a couple of
> optimisations, costing an extra 4 bytes per string object?
>
>
> --
> http://www.python.org/mailman/listinfo/python-list
>






More information about the Python-list mailing list