"/a" is not "/a" ?

Fri Mar 6 16:08:41 EST 2009

Steven D'Aprano wrote:
> Gary Herron wrote:
>
>   
>> Emanuele D'Arrigo wrote:
>>     
>>> Hi everybody,
>>>
>>> while testing a module today I stumbled on something that I can work
>>> around but I don't quite understand.
>>>   
>>>       
>> *Do NOT use "is" to compare immutable types.*    **Ever! **
>>     
>
> Huh? How am I supposed to compare immutable types for identity then? Your
> bizarre instruction would prohibit:
>
> if something is None
>   

Just use:

  if something == None

It does *exactly* the same thing. 

But...  I'm not (repeat NOT) saying *you* should do it this way.

I am saying that since newbies continually trip over incorrect uses of 
"is", they should be warned against using "is" in any situation until 
they understand the subtle nature or "is". 

If they use a couple "something==None" instead of "something is None"  
in their code while learning Python, it won't hurt, and they can change 
their style when they understand the difference.  And meanwhile they 
will skip  traps newbies fall into when they don't understand these 
things yet.

Gary Herron

> which is the recommended way to compare to None, which is immutable. The
> standard library has *many* identity tests to None.
>
> I would say, *always* use "is" to compare any type whenever you intend to
> compare by *identity* instead of equality. That's what it's for. If you use
> it to test for equality, you're doing it wrong. But in the very rare cases
> where you care about identity (and you almost never do), "is" is the
> correct tool to use.
>
>
>   
>> It is an implementation choice (usually driven by efficiency
>> considerations) to choose when two strings with the same value are stored
>> in memory once or twice.  In order for Python to recognize when a newly
>> created string has the same value as an already existing string, and so
>> use the already existing value, it would need to search *every* existing
>> string whenever a new string is created.
>>     
>
> Not at all. It's quite easy, and efficient. Here's a pure Python string
> constructor that caches strings.
>
> class CachedString(str):
>     _cache = {}
>     def __new__(cls, value):
>         s =  cls._cache.setdefault(value, value)
>         return s
>             
> Python even includes a built-in function to do this: intern(), although I
> believe it has been removed from Python 3.0.
>
>
>   
>> Clearly that's not going to be efficient. 
>>     
>
> Only if you do it the inefficient way.
>
>   
>> However, the C implementation of Python does a limited version 
>> of such a thing -- at least with strings of length 1.
>>     
>
> No, that's not right. The identity test fails for some strings of length
> one.
>
>   
>>>> a = '\n'
>>>> b = '\n'
>>>> len(a) == len(b) == 1
>>>>         
> True
>   
>>>> a is b
>>>>         
> False
>
>
> Clearly, Python doesn't intern all strings of length one. What Python
> actually interns are strings that look like, or could be, identifiers:
>
>   
>>>> a = 'heresareallylongstringthatisjustmade' \
>>>>         
> ... 'upofalphanumericcharacterssuitableforidentifiers123_'
>   
>>>>  
>>>> b = 'heresareallylongstringthatisjustmade' \
>>>>         
> ... 'upofalphanumericcharacterssuitableforidentifiers123_'
>   
>>>> a is b
>>>>         
> True
>
> It also does a similar thing for small integers, currently something
> like -10 through to 256 I believe, although this is an implementation
> detail subject to change.
>
>
>