[Python-Dev] PyObject_RichCompareBool identity shortcut

Thu Apr 28 06:33:07 CEST 2011

On 2011-04-27 23:24 , Guido van Rossum wrote:
> On Wed, Apr 27, 2011 at 9:15 PM, Alexander Belopolsky
> <alexander.belopolsky at gmail.com>  wrote:
>> On Wed, Apr 27, 2011 at 2:48 PM, Robert Kern<robert.kern at gmail.com>  wrote:
>> ..
>>> I suspect most of us would oppose changing it on general
>>> backwards-compatibility grounds rather than actually *liking* the current
>>> behavior. If the behavior changed with Python floats, we'd have to mull over
>>> whether we try to match that behavior with our scalar types (one of which
>>> subclasses from float) and our arrays. We would be either incompatible with
>>> Python or C, and we'd probably end up choosing Python to diverge from. It
>>> would make a mess, honestly. We already have to explain why equality is
>>> funky for arrays (arr1 == arr2 is a rich comparison that gives an array, not
>>> a bool, so we can't do containment tests for lists of arrays), so NaN is
>>> pretty easy to explain afterward.
>>
>> Most NumPy applications are actually not exposed to NaN problems
>> because it is recommended that NaNs be avoided in computations and
>> when missing or undefined values are necessary, the recommended
>> solution is to use ma.array or masked array which is a drop-in
>> replacement for numpy array type and carries a boolean "mask" value
>> with every element.  This allows to have undefined elements is arrays
>> of any type: float, integer or even boolean.  Masked values propagate
>> through all computations including comparisons.
>
> So do new masks get created when the outcome of an elementwise
> operation is a NaN?

No.

> Because that's the only reason why one should have
> NaNs in one's data in the first place -- not to indicate missing
> values!

Yes. I'm not sure that Alexander was being entirely clear. Masked arrays are 
intended to solve just the missing data problem and not the occurrence of NaNs 
from computations. There is still a persistent part of the community that really 
does like to use NaNs for missing data, though. I don't think that's entirely 
relevant to this discussion[1].

I wouldn't say that numpy applications aren't exposed to NaN problems. They are 
just as exposed to computational NaNs as you would expect any application that 
does that many flops to be.

[1] Okay, that's a lie. I'm sure that persistent minority would *love* to have 
NaN == NaN, because that would make their (ab)use of NaNs easier to work with.

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
  that is made terrible by our own mad attempt to interpret it as though it had
  an underlying truth."
   -- Umberto Eco