Rich Comparisons Gotcha

Mon Jan 5 19:55:06 EST 2009

Steven D'Aprano <steve at REMOVE-THIS-cybersource.com.au> wrote:

> There is nothing to blame them for. This is the correct behaviour. NaNs 
> should *not* compare equal to themselves, that's mathematically 
> incoherent.

Indeed.  The problem is a paucity of equality predicates.  This is
hardly surprising: Common Lisp has four general-purpose equality
predicates (EQ, EQL, EQUAL and EQUALP), and many more type-specific ones
(=, STRING=, STRING-EQUAL (yes, I know...), CHAR=, ...), and still
doesn't really have enough.  For example, EQUAL compares strings
case-sensitively, but other arrays are compared by address; EQUALP will
recurse into arbitrary arrays, but compares strings
case-insensitively...

For the purposes of this discussion, however, it has enough to be able
to distinguish between

  * numerical comparisons, which (as you explain later) should /not/
    claim that two NaNs are equal, and

  * object comparisons, which clearly must declare an object equal to
    itself.

For example, I had the following edifying conversation with SBCL.

CL-USER> ;; Return NaNs rather than signalling errors.
         (sb-int:set-floating-point-modes :traps nil)
; No value
CL-USER> (defconstant nan (/ 0.0 0.0))
NAN
CL-USER> (loop for func in '(eql equal equalp =)
	       collect (list func (funcall func nan nan)))
((EQL T) (EQUAL T) (EQUALP T) (= NIL))
CL-USER>

That is, a NaN is EQL, EQUAL and EQUALP to itself, but not = to itself.
(Due to the vagaries of EQ, a NaN might or might not be EQ to itself or
other NaNs.)

Python has a much more limited selection of equality predicates -- in
fact, just == and is.  The is operator is Python's equivalent of Lisp's
EQ predicate: it compares objects by address.  I can have a similar chat
with Python.

In [12]: nan = float('nan')

In [13]: nan is nan
Out[13]: True

In [14]: nan == nan
Out[14]: False

In [16]: nan is float('nan')
Out[16]: False

Python numbers are the same as themselves reliably, unlike in Lisp.  But
there's no sensible way of asking whether something is `basically the
same as' nan, like Lisp's EQL or EQUAL.  I agree that the primary
equality predicate for numbers must be the numerical comparison, and
NaNs can't (sensibly) be numerically equal to themselves.

Address comparisons are great when you're dealing with singletons, or
when you carefully intern your objects.  In other cases, you're left
with ==.  This puts a great deal of responsibility on the programmer of
an == method to weigh carefully the potentially conflicting demands of 
compatibility (many other libraries just expect == to be an equality
operator returning a straightforward truth value, and given that there
isn't a separate dedicated equality operator, this isn't unreasonable),
and doing something more domain-specifically useful.

It's worth pointing out that numpy isn't unique in having == not return
a straightforward truth value.  The SAGE computer algebra system (and
sympy, I believe) implement the == operator on algebraic formulae so as
to construct equations.  For example, the following is syntactically and
semantically Python, with fancy libraries.

sage: var('x')  # x is now a variable
x
sage: solve(x**2 + 2*x - 4 == 1)
[x == -sqrt(6) - 1, x == sqrt(6) - 1]

(SAGE has some syntactic tweaks, such as ^ meaning the same as **, but I
didn't use them.)

I think this is an excellent use of the == operator -- but it does have
some potential to interfere with other libraries which make assumptions
about how == behaves.  The SAGE developers have been clever here,
though:

sage: 2*x + 1 == (2 + 4*x)/2
2*x + 1 == (4*x + 2)/2
sage: bool(2*x + 1 == (2 + 4*x)/2)
True
sage: bool(2*x + 1 == (2 + 4*x)/3)
False

I think Python manages surprisingly well with its limited equality
predicates.  But the keyword there is `surprisingly' -- and it may not
continue this trick forever.

-- [mdw]