x.min() depends on ordering

Sat Nov 11 18:46:47 EST 2006

Robert Kern wrote:
> Keith Goodman wrote:
>   
>> x.min() and x.max() depend on the ordering of the elements:
>>
>>     
>>>> x = M.matrix([[ M.nan, 2.0, 1.0]])
>>>> x.min()
>>>>         
>> nan
>>
>>     
>>>> x = M.matrix([[ 1.0, 2.0, M.nan]])
>>>> x.min()
>>>>         
>> 1.0
>>
>> If I were to try the latter in ipython, I'd assume, great, min()
>> ignores NaNs. But then the former would be a bug in my program.
>>
>> Is this related to how sort works?
>>     
>
> Not really. sort() is a more complicated algorithm that does a number of
> different comparisons in an order that is difficult to determine beforehand.
> x.min() should just be a straight pass through all of the elements. However, the
> core problem is the same: a < nan, a > nan, a == nan are all False for any a.
>
> Barring a clever solution (at least cleverer than I feel like being
> immediately), the way to solve this would be to check for nans in the array and
> deal with them separately (or simply ignore them in the case of x.min()).
> However, this checking would slow down the common case that has no nans (sans
> nans, if you will).
>   
For ignoring NaNs, isn't is simply a matter of scanning through the 
array till you find the first non NaN the proceeding as normal? In the 
common case, this requires one extra compare (or rather is_nan) which 
should be negligible in most circumstances. Only when you have an array 
with a load of NaNs at the beginning would it be slow. One would have to 
decide whether to return NaN or raise an error when there were no real 
numbers.

My preference would be to raise an error / warning when there is a nan 
in the array. Technically, there is no minimum value when a nan is 
present. I believe that this would be feasible be swapping the compare 
from 'a < b' to '!(a >= b)'. This should return NaN if any NaNs are 
present and I suspect the extra '!' will have minimal performance impact 
but it would have to be tested. Then a warning or error could be issued 
on the way out depending on the erstate. Arguably returning NaN is more 
correct than returning the smallest non NaN anyway.

As for Keith Goodmans request for a NaN ignoring min function, I suggest:

    a[~np.isnan(a)].min()

Or better yet, stop generating so many NaN's.

-tim

-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642