python string comparison oddity

Wed Jun 18 19:05:08 EDT 2008

Faheem Mitha wrote:
> On Wed, 18 Jun 2008 12:57:44 -0700 (PDT), Lie <Lie.1296 at gmail.com> wrote:
>> On Jun 19, 2:26 am, Faheem Mitha <fah... at email.unc.edu> wrote:
>>> Hi everybody,
>>>
>>> I was wondering if anyone can explain this. My understanding is that 'is'
>>> checks if the object is the same. However, in that case, why this
>>> inconsistency for short strings? I would expect a 'False' for all three
>>> comparisons. This is reproducible across two different machines, so it is
>>> not just a local quirk. I'm running Debian etch with Python 2.4.4 (the
>>> default).
>>>                                                             Thanks, Faheem.
>>>
>>> In [1]: a = '--'
>>>
>>> In [2]: a is '--'
>>> Out[2]: False
>>>
>>> In [4]: a = '-'
>>>
>>> In [5]: a is '-'
>>> Out[5]: True
>>>
>>> In [6]: a = 'foo'
>>>
>>> In [7]: a is 'foo'
>>> Out[7]: True
>> Yes, this happens because of small objects caching. When small
>> integers or short strings are created, there are possibility that they
>> might refer to the same objects behind-the-scene. Don't rely on this
>> behavior.
> 
> Yes, but why is '-' and 'foo' cached, and not '--'? Do you know what
> the basis of the choice is?

Shortish Python identifiers and operators, I think. Plus a handful like '\x00'. 
The source would know for sure, but alas, I am lazy.

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
  that is made terrible by our own mad attempt to interpret it as though it had
  an underlying truth."
   -- Umberto Eco