Comparing strings from the back?

Oscar Benjamin oscar.j.benjamin at gmail.com
Mon Sep 10 10:43:34 EDT 2012


On 2012-09-10, Oscar Benjamin <oscar.j.benjamin at gmail.com> wrote:
> On 2012-09-10, Chris Angelico <rosuav at gmail.com> wrote:
>> On Tue, Sep 11, 2012 at 12:06 AM, Oscar Benjamin
>><oscar.j.benjamin at gmail.com> wrote:
>>> On 2012-09-10, Steven D'Aprano <steve+comp.lang.python at pearwood.info> wrote:
>>>> What interning buys you is that "s == t" is an O(1) pointer compare if
>>>> they are equal. But if s and t differ in the last character, __eq__ will
>>>> still inspect every character. There is no way to tell Python "all
>>>> strings are interned, if s is not t then s != t as well".
>>>>
>>>
>>> I thought that if *both* strings were interned then a pointer comparison
>>> could decide if they were unequal without needing to check the characters.
>>>
>>> Have I misunderstood how intern() works?
>>
>> In a language where _all_ strings are guaranteed to be interned (such
>> as Lua, I think), you do indeed gain this. Pointer inequality implies
>> string inequality. But when interning is optional (as in Python), you
>> cannot depend on that, unless there's some way of recognizing interned
>> strings. Of course, that may indeed be the case; a simple bit flag
>> "this string has been interned" would suffice, and if both strings are
>> interned AND their pointers differ, THEN you can be sure the strings
>> differ.
>>
>> I have no idea whether or not CPython version X.Y.Z does this. The
>> value of such an optimization really depends on how likely strings are
>> to be interned; for instance, if the compiler automatically interns
>> all the names of builtins, this could be quite beneficial. Otherwise,
>> probably not; most Python scripts don't bother interning anything.
>>
>
> I haven't looked at the source but my understanding was precisely that there
> is an intern() bit and that not only the builtins module but all the literals
> in any byte-compiled module are interned.
>

s/literals/identifiers/

You can see the interned flag in the PyUnicodeObject struct here:
http://hg.python.org/cpython/file/3ffd6ad93fe4/Include/unicodeobject.h#l303

Oscar




More information about the Python-list mailing list