[Python-Dev] Identity implies equality

Thu Apr 28 21:51:29 CEST 2011

ISTM there is no right or wrong answer.
There is just a question of what is most useful.

AFAICT, the code for dictionaries (and therefore the code for sets)
has always had identity-implies-equality logic.  It makes dicts
blindingly fast for common cases.  It also confers some nice
properties like making it possible to retrieve a NaN that has
been stored as a key; otherwise, you could store it but not
look it up, pop it, or delete it (because the equality test would
always fail).  The logic also confers other nice-to-have
properties such as:  

*  d[k] = v; assert k in d   # assignment-implies-contains
*  assert all(k in d for k in d)  # all-members-are-members

These aren't essential invariants but they do provide
a pleasant programming environment and make it easier
to reason about programs.

Another place where identity-implies-equality logic
is explicit is in Py_RichCompareBool().  That lets
methods in many other functions and methods work like
dicts and sets.  It speeds them up and confers
some nice-to-haves like:

*  mylist.append(obj) implies mylist.count(obj) > 0 
*  x = obj implies x == obj   # assignment really works

There may be lots of other code that implicitly
makes similar assumptions.  I don't know how you
could reliably find those and rip them out.

If identity-implies-equality does get ripped out,
I don't know what we would win.  It would make it
possible to do some cute NaN tricks, but I don't
think you can defend against the general problem
of funky objects being able to muck-up code that
looks correct.  You get oddities when an object
lies about its length.  You get oddities when an
object has a hash that doesn't match its equality
function.  The situation with NaNs and sorts is
a prime example:

   >>> sorted([1.2, 3.4, float('Nan'), -1.2, 
              float('Inf'), float('Nan')]) 
   [1.2, 3.4, nan, -1.2, inf, nan]

Personally, I think the status quo is fine
and that practicality is beating purity.
High quality programs are written every day.
Numeric programmers seem to have no problem
using NaNs as-is.  AFAICT, the only actual
problem in front us is the OP's post where
he was able to surprise himself with some
NaN experiments at the interactive prompt.

Raymond