why () is () and [] is [] work in other way?

Wed Apr 25 23:50:21 EDT 2012

On Apr 25, 8:01 pm, Steven D'Aprano <steve
+comp.lang.pyt... at pearwood.info> wrote:
> On Wed, 25 Apr 2012 13:49:24 -0700, Adam Skutt wrote:
> > Though, maybe it's better to use a different keyword than 'is' though,
> > due to the plain English
> > connotations of the term; I like 'sameobj' personally, for whatever
> > little it matters.  Really, I think taking away the 'is' operator
> > altogether is better, so the only way to test identity is:
> >     id(x) == id(y)
>
> Four reasons why that's a bad idea:
>
> 1) The "is" operator is fast, because it can be implemented directly by
> the interpreter as a simple pointer comparison (or equivalent). The id()
> idiom is slow, because it involves two global lookups and an equality
> comparison. Inside a tight loop, that can make a big difference in speed.

The runtime can optimize the two operations to be equivalent, since
they are logically equivalent operations.  If you removed 'is',
there's little reason to believe it would do otherwise.

>
> 2) The "is" operator always has the exact same semantics and cannot be
> overridden. The id() function can be monkey-patched.
>

I can't see how that's useful at all.  Identity is a fundamental
property of an object; hence retrieval of it must be a language
operation.  The fact Python chooses to do otherwise is unfortunate,
but also irrelevant to my position.

> 3) The "is" idiom semantics is direct: "a is b" directly tests the thing
> you want to test, namely whether a is b. The id() idiom is indirect:
> "id(a) == id(b)" only indirectly tests whether a is b.

The two expressions are logically equivalent, so I don't see how this
matters, nor how it is true.

>
> 4) The id() idiom already breaks if you replace names a, b with
> expressions:
>
> >>> id([1,2]) == id([3,4])
>
> True

It's not broken at all.  The lifetime of temporary objects is
intentionally undefined, and that's a /good/ thing.  What's
unfortunate is that CPython optimizes temporaries differently between
the two logically equivalent expressions.

As long as this holds:
>>> class A(object):
...     def __del__(self):
...        print "Farewell to: %d" % id(self)
...
>>> A() is A()
Farewell to: 4146953292
Farewell to: 4146953260
False
>>> id(A()) == id(A())
Farewell to: 4146953420
Farewell to: 4146953420
True

then there's nothing "broken" about the behavior of either expression.
I personally think logically equivalent expressions should give the
same results, but since both operations follow the rules of object
identity correctly, it's not the end of the world.  It's only
surprising to the programmer if:
    1) They don't understand identity.
    2) They don't understand what objects are and are not temporaries.

Code that relies on the identity of a temporary object is generally
incorrect.  This is why C++ explicitly forbids taking the address
(identity) of temporaries.  As such, the language behavior in your
case is inconsequential.  Making demons fly out of the programmer's
nose would be equally appropriate.

The other solution is to do what Java and C# do: banish id() entirely
and only provide 'is' (== in Java, Object.ReferenceEquals() in C#).
That seems just as fine, really,  Practically, it's also probably the
better solution for CPython, which is fine by me.  My preference for
keeping id() and removing 'is' probably comes from my background as a C
++ programmer, and I already said it matters very little.

> But that's absolutely wrong. id(x) returns an ID, not an address.
> It just
> happens that, as an accident of implementation, the CPython interpreter
> uses the object address as an ID, because objects can't move. That's not
> the case for all implementations. In Jython, objects can move and the
> address is not static, and so IDs are assigned on demand starting with 1:
>
> steve at runes:~$ jython
> Jython 2.5.1+ (Release_2_5_1, Aug 4 2010, 07:18:19)
> [OpenJDK Client VM (Sun Microsystems Inc.)] on java1.6.0_18
> Type "help", "copyright", "credits" or "license" for more information.>>> id(42)
> 1
> >>> id("Hello World!")
> 2
> >>> id(None)
>
> 3
>

An address is an identifier: a number that I can use to access a
value[1].  I never said that id() must return an address the host CPU
understands (virtual, physical, or otherwise).  Most languages use
addresses that the host CPU cannot understand without assistance at
least sometimes, including C on some platforms.

> Other implementations may make other choices. I don't believe that the
> language even defines the id as a number, although I could be wrong about
> that.

http://docs.python.org/library/functions.html#id says it must be an
integer of some sort.  Even if it didn't say that, it hardly seems as
a practical imposition.

>
> Personally, I prefer the Jython approach, because it avoids those
> annoying questions like "How do I dereference the address of an
> object?" (answer: Python is not C, you can't do that),

The right way to solve that question isn't to fix the runtime, but to
teach people what pointer semantics actually mean, much like the
identity problem we're discussing now.

Adam

[1] I'd be more willing to accept a more general definition that
allows for non-numeric addresses, but such things are rare.