Rich Comparisons Gotcha

Sun Dec 7 11:23:59 EST 2008

> On Sun, 07 Dec 2008 13:03:43 +0000, Rasmus Fogh wrote:
>> Jamed Stroud Wrote:
> ...
>>> Second, consider that any value in python also evaluates to a truth
>>> value in boolean context.

> But bool(x) can fail too. So not every object in Python can be
> interpreted as a truth value.

>>> Third, every function returns something.

> Unless it doesn't return at all.

>>> A function's returning nothing
>>> is not a possibility in the python language. None is something but
>>> evaluates to False in boolean context.

>> Indeed. The requirement would be not that return_value was a boolean,
>> but that bool(return_value) was defined and gave the correct result.

> If __bool__ or __nonzero__ raises an exception, you would like Python to
> ignore the exception and return True or False. Which should it be? How
> do you know what the correct result should be?

> From the Zen of Python:

> "In the face of ambiguity, refuse the temptation to guess."

> All binary operators are ambiguous when dealing with vector or array
> operands. Should the operator operate on the array as a whole, or on
> each element? The numpy people have decided that element-wise equality
> testing is more useful for them, and this is their prerogative to do so.
> In fact, the move to rich comparisons was driven by the needs of numpy.

> http://www.python.org/dev/peps/pep-0207/

> It is a *VERY* important third-party library, and this was not the first
> and probably won't be the last time that their needs will move into
> Python the language.

> Python encourages such domain-specific behaviour. In fact, that's what
> operator-overloading is all about: classes can define what any operator
> means for *them*. There's no requirement that the infinity of potential
> classes must all define operators in a mutually compatible fashion, not
> even for comparison operators.

> For example, consider a class implementing one particular version of
> three-value logic. It isn't enough for == to only return True or False,
> because you also need Maybe:

> True == False => returns False
> True == True => returns True
> True == Maybe => returns Maybe
> etc.

> Or consider fuzzy logic, where instead of two truth values, you have a
> continuum of truth values between 0.0 and 1.0. What should comparing two
> such fuzzy values for equality return? A boolean True/False? Another
> fuzzy value?

> Another one from the Zen:

> "Special cases aren't special enough to break the rules."

> The rules are that classes can customize their behaviour, that methods
> can fail, and that Python should not try to guess what the correct value
> should have been in the event of such a failure. Equality is a special
> case, but it isn't so special that it needs to be an exception from
> those rules.

> If you really need a guaranteed-can't-fail[1] equality test, try
> something like this untested wrapper class:

> class EqualityWrapper(object):
>    def __init__(self, obj):
>        self.wrapped = obj
>    def __eq__(self, other):
>        try:
>            return bool(self.wrapped == other)
>        except Exception:
>            return False  # or maybe True?

> Now wrap all your data:

> data = [a list of arbitrary objects]
> data = map(EqualityWrapper, data)
> process(data)

> [1] Not a guarantee.

Well, lots to think about.

Just to keep you from shooting at straw men:

I would have liked it to be part of the design contract (a convention, if
you like) that
1) bool(x == y) should return a boolean and never throw an error
2) x == x return True

I do *not* say that bool(x) should never throw an error.
I do *not* say that Python should guess a return value if an __eq__
function throws an error, only that it should have been considered a bug,
or at least bad form, for __eq__ functions to do so.

What might be a sensible behaviour (unlike your proposed wrapper) would be
the following:

def eq(x, y):
  if x is y:
    return True
  else:
    try:
      return (x == y)
    except Exception:
      return False

If is is possible to change the language, how about having two
diferent functions, one for overloading the '==' operator, and another
for testing list and set membership, dictionary key identity, etc.?
For instance like this
- Add a new function __equals__; x.__equals__(y) could default to
  bool(x.__eq__(y))
- Estalish by convention that x.__equals__(y) must return a boolean and
  may not intentionally throw an error.
- Establish by convention that 'x is y' implies 'x.__equals__(y)'
  in the sense that (not (x is y and not x.__equals__(y)) must always hold
- Have the Python data structures call __equals__ when they want to
  compare objects internally (e.g. for 'x in alist', 'x in adict',
  'set(alist)', etc.
- Provide an equals(x,y) built-in that calls the __equals__ function
- numpy and others who (mis)use '==' for their own purposes could use
  def __equals__(self, other): return (self is other)

For the float NaN case it looks like things are already behaving like
this. For numpy objects you would not lose anything, since
'numpyArray in alist' is broken anyway.

I still think it is a bad choice that numpy got to write
  array1 == array2
for their purposes, while everybody else has to use
  if equals(x, y):
but at least both sides could get the behaviour they want.

Yours,

Rasmus

---------------------------------------------------------------------------
Dr. Rasmus H. Fogh                  Email: r.h.fogh at bioc.cam.ac.uk
Dept. of Biochemistry, University of Cambridge,
80 Tennis Court Road, Cambridge CB2 1GA, UK.     FAX (01223)766002