[Python-Dev] Identity implies equality

Raymond Hettinger raymond.hettinger at gmail.com
Thu Apr 28 21:51:29 CEST 2011


ISTM there is no right or wrong answer.
There is just a question of what is most useful.

AFAICT, the code for dictionaries (and therefore the code for sets)
has always had identity-implies-equality logic.  It makes dicts
blindingly fast for common cases.  It also confers some nice
properties like making it possible to retrieve a NaN that has
been stored as a key; otherwise, you could store it but not
look it up, pop it, or delete it (because the equality test would
always fail).  The logic also confers other nice-to-have
properties such as:  

*  d[k] = v; assert k in d   # assignment-implies-contains
*  assert all(k in d for k in d)  # all-members-are-members

These aren't essential invariants but they do provide
a pleasant programming environment and make it easier
to reason about programs.

Another place where identity-implies-equality logic
is explicit is in Py_RichCompareBool().  That lets
methods in many other functions and methods work like
dicts and sets.  It speeds them up and confers
some nice-to-haves like:

*  mylist.append(obj) implies mylist.count(obj) > 0 
*  x = obj implies x == obj   # assignment really works

There may be lots of other code that implicitly
makes similar assumptions.  I don't know how you
could reliably find those and rip them out.

If identity-implies-equality does get ripped out,
I don't know what we would win.  It would make it
possible to do some cute NaN tricks, but I don't
think you can defend against the general problem
of funky objects being able to muck-up code that
looks correct.  You get oddities when an object
lies about its length.  You get oddities when an
object has a hash that doesn't match its equality
function.  The situation with NaNs and sorts is
a prime example:

   >>> sorted([1.2, 3.4, float('Nan'), -1.2, 
              float('Inf'), float('Nan')]) 
   [1.2, 3.4, nan, -1.2, inf, nan]

Personally, I think the status quo is fine
and that practicality is beating purity.
High quality programs are written every day.
Numeric programmers seem to have no problem
using NaNs as-is.  AFAICT, the only actual
problem in front us is the OP's post where
he was able to surprise himself with some
NaN experiments at the interactive prompt.


Raymond


More information about the Python-Dev mailing list