Comparing strings from the back?

Oscar Benjamin oscar.j.benjamin at gmail.com
Mon Sep 10 10:32:22 EDT 2012


On 2012-09-10, Chris Angelico <rosuav at gmail.com> wrote:
> On Tue, Sep 11, 2012 at 12:06 AM, Oscar Benjamin
><oscar.j.benjamin at gmail.com> wrote:
>> On 2012-09-10, Steven D'Aprano <steve+comp.lang.python at pearwood.info> wrote:
>>> What interning buys you is that "s == t" is an O(1) pointer compare if
>>> they are equal. But if s and t differ in the last character, __eq__ will
>>> still inspect every character. There is no way to tell Python "all
>>> strings are interned, if s is not t then s != t as well".
>>>
>>
>> I thought that if *both* strings were interned then a pointer comparison
>> could decide if they were unequal without needing to check the characters.
>>
>> Have I misunderstood how intern() works?
>
> In a language where _all_ strings are guaranteed to be interned (such
> as Lua, I think), you do indeed gain this. Pointer inequality implies
> string inequality. But when interning is optional (as in Python), you
> cannot depend on that, unless there's some way of recognizing interned
> strings. Of course, that may indeed be the case; a simple bit flag
> "this string has been interned" would suffice, and if both strings are
> interned AND their pointers differ, THEN you can be sure the strings
> differ.
>
> I have no idea whether or not CPython version X.Y.Z does this. The
> value of such an optimization really depends on how likely strings are
> to be interned; for instance, if the compiler automatically interns
> all the names of builtins, this could be quite beneficial. Otherwise,
> probably not; most Python scripts don't bother interning anything.
>

I haven't looked at the source but my understanding was precisely that there
is an intern() bit and that not only the builtins module but all the literals
in any byte-compiled module are interned.

Oscar




More information about the Python-list mailing list