is implemented with id ?

Steven D'Aprano steve+comp.lang.python at pearwood.info
Sat Nov 3 18:18:15 EDT 2012


On Sat, 03 Nov 2012 22:49:07 +0100, Hans Mulder wrote:

> On 3/11/12 20:41:28, Aahz wrote:
>> [got some free time, catching up to threads two months old]
>> 
>> In article <50475822$0$6867$e4fe514c at news2.news.xs4all.nl>, Hans Mulder
>>  <hansmu at xs4all.nl> wrote:
>>> On 5/09/12 15:19:47, Franck Ditter wrote:
>>>>
>>>> - I should have said that I work with Python 3. Does that matter ? -
>>>> May I reformulate the queston : "a is b" and "id(a) == id(b)"
>>>>   both mean : "a et b share the same physical address". Is that True
>>>>   ?
>>>
>>> Yes.
>>>
>>> Keep in mind, though, that in some implementation (e.g.  Jython), the
>>> physical address may change during the life time of an object.
>>>
>>> It's usually phrased as "a and b are the same object".  If the object
>>> is mutable, then changing a will also change b.  If a and b aren't
>>> mutable, then it doesn't really matter whether they share a physical
>>> address.
>> 
>> That last sentence is not quite true.  intern() is used to ensure that
>> strings share a physical address to save memory.
> 
> That's a matter of perspective: in my book, the primary advantage of
> working with interned strings is that I can use 'is' rather than '==' to
> test for equality if I know my strings are interned.  The space savings
> are minor; the time savings may be significant.

Actually, for many applications, the space "savings" may actually be 
*costs*, since interning forces Python to hold onto strings even after 
they would normally be garbage collected. CPython interns strings that 
look like identifiers. It really wouldn't be a good idea for it to 
automatically intern every string.

You can make your own intern system with a simple dict:

interned_strings = {}

Then, for every string you care about, do:

s = interned_strings.set_default(s, s)

to ensure you are always working with a single string object for each 
unique value. In some applications that will save time at the expense of 
space.

And there is no need to write "is" instead of "==", because string 
equality already optimizes the "strings are identical" case. By using ==, 
you don't get into bad habits, you defend against the odd un-interned 
string sneaking in, and you still have high speed equality tests.


-- 
Steven



More information about the Python-list mailing list