Rich Comparisons Gotcha

Sun Dec 7 22:36:17 EST 2008

Robert Kern wrote:
> James Stroud wrote:
>> I'm missing how a.all() solves the problem Rasmus describes, namely 
>> that the order of a python *list* affects the results of containment 
>> tests by numpy.array. E.g. "y in ll1" and "y in ll2" evaluate to 
>> different results in his example. It still seems like a bug in numpy 
>> to me, even if too much other stuff is broken if you fix it (in which 
>> case it apparently becomes an "issue").
> 
> It's an issue, if anything, not a bug. There is no consistent 
> implementation of bool(some_array) that works in all cases. numpy's 
> predecessor Numeric used to implement this as returning True if at least 
> one element was non-zero. This works well for bool(x!=y) (which is 
> equivalent to (x!=y).any()) but does not work well for bool(x==y) (which 
> should be (x==y).all()), but many people got confused and thought that 
> bool(x==y) worked. When we made numpy, we decided to explicitly not 
> allow bool(some_array) so that people will not write buggy code like 
> this again.
> 
> The deficiency is in the feature of rich comparisons, not numpy's 
> implementation of it. __eq__() is allowed to return non-booleans; 
> however, there are some parts of Python's implementation like 
> list.__contains__() that still expect the return value of __eq__() to be 
> meaningfully cast to a boolean.
>

You have explained

py> 112 = [1, y]
py> y in 112
Traceback (most recent call last):
   File "<stdin>", line 1, in <module>
ValueError: The truth value of an array with more than one element is...

but not

py> ll1 = [y,1]
py> y in ll1
True

It's this discrepancy that seems like a bug, not that a ValueError is 
raised in the former case, which is perfectly reasonable to me.

All I can imagine is that something like the following lives in the 
bowels of the python code for list:

def __contains__(self, other):
   foundit = False
   for i, v in enumerate(self):
     if i == 0:
       # evaluates to bool numpy array
       foundit = one_kind_of_test(v, other)
     else:
       # raises exception for numpy array
       foundit = another_kind_of_test(v, other)
     if foundit:
       break
   return foundit

I'm trying to imagine some other way to get the results mentioned but I 
honestly can't. It's beyond me why someone would do such a thing, but 
perhaps it's an optimization of some sort.

James