Rich Comparisons Gotcha

Tue Jan 6 20:23:19 EST 2009

Steven D'Aprano <steven at REMOVE.THIS.cybersource.com.au> wrote:

> To prove my claim, all you need is two domains with a mutually 
> incompatible definition of equality. That's not so difficult, surely? How 
> about equality of integers, versus equality of integers modulo some N?

No, that's not an example.  The integers modulo N form a ring Z/NZ of
residue classes.  Such residue classes are distinct from the integers --
e.g., an integer 3 (say) is not the same as the set 3 + NZ { ..., 3 - 2N,
3 - N, 3, 3 + N, 3 + 2N, ... } -- but there is a homomorphism from Z
to Z/NZ under which 3 + NZ is the image of 3.

If we decide to define the == operator such that 3 == 3 + NZ and 3 + N
== 3 + NZ then == is not an equivalence relation (in particular,
transitivity fails).  But that's just an artifact of the definition.  If
we distinguish 3 from 3 + NZ then everything is fine.  3 + NZ == (3 + N)
+ NZ correctly, but 3 != 3 + N, and all is well.

Here, at least, the problem is not that == as an equivalence relation
fails in some particular domain -- because in both Z and Z/NZ it can be
a perfectly fine equivalence relation -- but that it can potentially
fail on the boundaries between domains.  Easy answer: don't mess it up
at the boundaries.

Proposition.  Let U, U' be disjoint sets, and let E, E' be equivalence
relations on U, U' respectively.  Define E^ on U union U' as E^ = E
union E', i.e.,

  E^(x, y) iff x in U and y in U and E(x, y) or
               x in U' and y in U' and E'(x, y)

Then E^ is an equivalence relation.

Proof.  Reflexivity and symmetry are trivial; transitivity follows from
disjointness of U and U'.

> It *can* be a problem, if you insist on using == on arbitrary types
> while still expecting it to be an equivalence relation.

Unfortunately, from the surrounding discussion, it seems that container
types particularly want to be able to contain arbitrary objects, and the
failure of == to be a equivalence relation makes this fail.  The problem
is that objects with wacky == operators are still more or less quacking
like the more usual kinds of ducks; but they turn out to taste very
different.

> Let's denote regular, case-sensitive strings using "abc", and special, 
> case-insensitive strings using i"abc". So for regular strings, equality 
> is an e-r; for case-insensitive strings, equality is also an e-r  (I 
> trust that the truth of this is obvious). But if you try to use equality 
> on *both* regular and case-insensitive strings, it fails to be an e-r:
> 
> i"abc" =~ "ABC" returns True if you use the case-insensitive definition 
> of equality, but returns False if you use the case-sensitive definition. 
> There is no single definition of equality that is *simultaneously* case-
> sensitive and case-insensitive.

A case-sensitive string is /not the same/ as a case-insensitive string.
One's a duck, the other's a goose.  I'd claim here that i"abc" =~ "ABC"
must be False, because i"abc" =~ "abc" must be false also!  To define it
otherwise leads to the incoherence you describe.  But the above
proposition provides an easy answer.

> > A valuable property might be that x =~ y if x and y are
> > indistinguishable without using `is'.
> 
> That's a little strong, because it implies that equality must look at 
> *everything* about a particular object, not just whatever bits of data 
> are relevant for the problem domain.

Yes.  That's one of the reasons that =~ isn't the same as ==.

I've been thinking on my feet in this thread, so I haven't thought
everything through.  And as I mention below, there are /many/ useful
equality predicates on values.  As I didn't mention (but hope is
obvious) having a massively-parametrized equality predicate is daft, and
providing enough to suit every possible application equally so.  But we
might be able to do well enough with just one or two -- or maybe by just
leaving things as they are.

> For example, consider storing data in a dict.
> 
> >>> D1 = {-1: 0, -2: 0}
> >>> D2 = {-2: 0}
> >>> D2[-1] = 0
> >>> D1 == D2
> True
> 
> 
> We certainly want D1 and D2 to be equal.

Do we?  If we're using my `indistinguishable without using ``is'''
criterion from above, then D1 and D2 are certainly different!  To detect
the difference, mutate one and see if the other changes:

def distinct_dictionaries_p(D1, D2):
  """
  Decide whether D1 and D2 are the same dictionary or not.
  Not threadsafe.
  """
  magic = []
  more_magic = [magic]
  old = D1.get('mumble', more_magic)
  D1['mumble'] = magic
  result = D2.get('mumble', more_magic) is magic
  if old is more_magic:
    del D1['mumble']
  else:
    D1['mumble'] = old
  return result

But that criterion was a suggestion -- a way of defining a coherent
equivalence relation on the whole of the Python value space which is
coarser than `is' and maybe more useful.  My primary purpose in
proposing it was to stimulate discussion: what /do/ we want from
equality predicates?  We already have `is', which is too fine-grained to
be widely useful: it distinguishes between different instances of the
number 500000, for example, and I can't for the life of me see why
that's a useful behaviour.  (The `is' operator is a fine thing, and I
wouldn't want it any other way: it trades away some useful semantics for
the sake of speed, and that was the /right/ decision.)

My criterion succeeds in distinguishing 1 from 1.0 (they have different
types), which may be considered good.  It doesn't distinguish a quiet
NaN from another quiet NaN: that's definitely good.  (It'd be bogus for
a numeric equality operator, but we've already got one of those, so we
don't need to define another.)  But you're probably right: it's still
too fine-grained for some purposes.

> But their history is different, and that makes their internal details
> different, which has detectable consequences:
> 
> >>> D1
> {-2: 0, -1: 0}
> >>> D2
> {-1: 0, -2: 0}

So in this case, `str' also works as a distinguisher.  Fine.

> There may be problem domains where the order of elements in a list (or 
> tree structure) *is* important, and other problem domains where order is 
> irrelevant. One single relation can't cover all such conflicting 
> requirements.

Absolutely.  This is why Common Lisp provides four(!) out of the box and
it still isn't enough.  Python provides one (`is') and a half (`==' when
it's behaving) is actually coping remarkably well considering.  But this
/is/ causing problems, and so thinking about solutions seems reasonable.

I'm not trying to change the language.  I don't have a pet feature I
want added.  I do think the discussion is interesting and worthwhile,
though.

-- [mdw]