Rich Comparisons Gotcha

Rasmus Fogh rhf22 at mole.bio.cam.ac.uk
Sun Dec 7 07:43:59 EST 2008


Robert Kern Wrote:
>Terry Reedy wrote:
>> Rasmus Fogh wrote:
>>> Personally I would like to get these !@#$%&* misfeatures removed,
>>
>> What you are calling a misfeature is an absence, not a presence that
>> can be removed.
>
> That's not quite true. Rich comparisons explicitly allow non-boolean
> return values. Breaking up __cmp__ into multiple __special__ methods was
> not the sole purpose of rich comparisons. One of the prime examples at the
> time was numpy (well, Numeric at the time). We wanted to use == to be able
> to return an array
> with boolean values where the two operand arrays were equal. E.g.
>
> In [1]: from numpy import *
>
> In [2]: array([1, 2, 3]) == array([4, 2, 3])
> Out[2]: array([False,  True,  True], dtype=bool)
>
> SQLAlchemy uses these operators to build up objects that will be turned
> into SQL expressions.
>
> >>> print users.c.id==addresses.c.user_id
> <users.id = addresses.user_id
>
> Basically, the idea was to turn these operators into full-fledged
> operators like +-/*. Returning a non-boolean violates neither the letter,
> nor the spirit of the feature.
>
> Unfortunately, if you do overload __eq__ to build up expressions or
> whatnot, the other places where users of __eq__ are implicitly expecting
> a boolean break.
> While I was (and am) a supporter of rich comparisons, I feel Rasmus's
> pain from time to time. It would be nice to have an alternate method to
> express the boolean "yes, this thing is equal in value to that other thing".
> Unfortunately, I haven't figured out a good way to fit it in now without
> sacrificing rich comparisons entirely.

The best way, IMHO, would have been to use an alternative notation in
numpy and SQLalchemy, and have '==' always return only a truth value - it
could be a non-boolean as long as the bool() function gave the correct
result. Surely the extra convenience of overloading '==' in special cases
was not worth breaking such basic operations as 'bool(x == y)' or
'x in alist'. Again, the problem is only with '==', not with '>', '<='
etc. Of course it is done now, and unlikely to be reversed.

>>> and constrain the __eq__ function to always return a truth value.
>>
>> It is impossible to do that with certainty by any mechanical
>> creation-time checking.  So the implementation of operator.eq would
>> have to check the return value of the ob.__eq__ function it calls *every
>> time*.  That would slow down the speed of the 99.xx% of cases where the
>> check is not needed and would still not prevent exceptions.  And if the
>> return value was bad, all operator.eq could do is raise and exception
>> anyway.
>
>Sure, but then it would be a bug to return a non-boolean from __eq__ and
>friends. It is not a bug today. I think that's what Rasmus is proposing.

Yes, that is the point. If __eq__ functions are *supposed* to return
booleans I can write generic code that will work for well-behaved objects,
and any errors will be somebody elses fault. If __eq__ is free to return
anything, or throw an error, it becomes my responsibility to write generic
code that will work anyway, including with floating point numbers, numpy,
or SQLalchemy. And I cannot see any way to do that (suggestions welcome).
If purportedly general code does not work with numpy, your average numpy
user will not be receptive to the idea that it is all numpys fault.

Current behaviour is both inconsistent and counterintuitive, as these
examples show.

>>> x = float('NaN')
>>> x == x
False
>>> ll = [x]
>>> x in ll
True
>>> x == ll[0]
False

>>> import numpy
>>> y = numpy.zeros((3,))
>>> y
array([ 0.,  0.,  0.])
>>> bool(y==y)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: The truth value of an array with more than one element is
ambiguous. Use a.any() or a.all()
>>> ll1 = [y,1]
>>> y in ll1
True
>>> ll2 = [1,y]
>>> y in ll2
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: The truth value of an array with more than one element is
ambiguous. Use a.any() or a.all()
>>>

Can anybody see a way this could be fixed (please)? I may well have to
live with it, but I would really prefer not to.

---------------------------------------------------------------------------
Dr. Rasmus H. Fogh                  Email: r.h.fogh at bioc.cam.ac.uk
Dept. of Biochemistry, University of Cambridge,
80 Tennis Court Road, Cambridge CB2 1GA, UK.     FAX (01223)766002




More information about the Python-list mailing list