intern'ed strings and deepcopy()

Alexander Schmolck a.schmolck at gmx.net
Sun Apr 13 09:06:54 EDT 2003


Peter Hansen <peter at engcorp.com> writes:
> What is this "symbol" that you keep talking about?  If you

I was refering to interned strings as symbols, because the only reason I can
think of why you'd care about (address) identity of strings is because you'd
like to use them as symbols.


> are simply using that as an alias for "interned string", then
> I believe you are making up reasons for those which they are
> *not* intended to have.  You said "the idea of symbols is their
> uniqueness".  Well, I don't know what a "symbol" is to you,

I was using (primitive) symbols loosely as in:

stuff you use to name (designate) other stuff. You can use any stuff as
symbols of which can produce sufficient amounts of unique instances and that
allows you to establish (some form of) identity relationship. But obviously
it's nice if

a) you have some convinient notation for writing and displaying symbols
   (strings do have that property)
b) creation and establishing identity are efficient (string equality would
   also do but is of course much slower than pointer comparision, i.e.
   identity via "is")

> but I believe that "interned strings" are *not* intended to 
> guarantee some "uniqueness" property which you may count on 
> in your code.  Any such property is a side effect of the

Well, unless python has a radically different idea from anyone else on what
interning a string (and "returning *the* interned string" string) means, I'd
really think that you can rely on:

>>> type(a) == type(b) == str and b == a and inter(b) is intern(a)
1

(but maybe one of the cognoscenti would like to affirm that?)

> implementation or something.  I believe the sole purpose of

Hmm, what implementation alternative do you have in mind? Unless intern is not
guaranteed to have any effect (i.e. it is a mere suggestion of some vague
optimization intent to the interpreter) I can't really think of what else it
could reasonably do.

> an interned string ("symbol"?) is to *optimize* access to those
> strings, i.e. for performance reasons.  

Sure, but the reason you want SYMBOL to be identical and not merely equal
(only) to itself is largely performance reason (apart from maybe that using
``X == SYMBOL`` might not be safe (depending on what X can be), so you'd have
to write your own predicate).

> 
> See http://www.python.org/doc/current/lib/built-in-funcs.html#built-in-funcs
> under "intern()" for what I believe is the only official purpose
> of interning strings: "to gain a little performance on dictionary lookup".

I agree the wording there is not particularly clear (and python doesn't much
encourage "symbolic computation" -- so the intent is not to have people write
intern("foo") all over the place where a lisper would write e.g. :foo), but
I'd be really suprised if the intern semantics were intended to be unspecified
w.r.t. the identity property above.

> 
> -Peter

'as




More information about the Python-list mailing list