[Numpy-discussion] min() of array containing NaN

Bruce Southey bsouthey at gmail.com
Mon Aug 11 21:34:52 EDT 2008


I agree with using Masked arrays...

Actually this could be viewed as a bug because it ignores the entries
to the left of the NaN.
>>> numpy.__version__
'1.1.1.dev5559'
>>> x = numpy.array([0,1,2,numpy.nan, 4, 5, 6])
>>> numpy.min(x)
4.0
>>> x = numpy.array([numpy.nan,0,1,2, 4, 5, 6])
>>> x.min()
0.0
>>> x = numpy.array([0,1,2, 4, 5, 6, numpy.nan])
>>> x.min()
-1.#IND

As has been recently said on this list (as per Stefan's post) NaN's
and infinity have a higher computational cost. I am not sure the
relative cost of using say isnan first as a check or having a NaN flag
stored as part of the ndarray class.

As per Travis's post, technically it should return NaN. But I don't
agree with Charles that it should automatically call nanmin because
nanmin treats NaNs as zero, positive infinity as a really large
positive number and negative infinity as a very small or negative
number. This may not be want the user wants. An alternative is to
change the signature to include a flag to include or exclude NaN and
infinity which would also remove the need for nanmin and friends.

Bruce

On Mon, Aug 11, 2008 at 6:41 PM, Pierre GM <pgmdevlist at gmail.com> wrote:
> *cough* MaskedArrays anyone ? *cough*
>
> The ideal would be for min/max to output a NaN when there's a NaN somewhere.
> That way, you'd know that there's a potential pb in your data, and that you
> should use the nanfunctions or masked arrays.
>
> is there a page on the wiki for that matter ? It seems to show up regularly...
>
> On Monday 11 August 2008 18:49:06 Stéfan van der Walt wrote:
>> 2008/8/11 Charles Doutriaux <doutriaux1 at llnl.gov>:
>> > Seems to me like min should automagically  call nanmin if it spots any
>> > nan no ?
>>
>> Nanmin is quite a bit slower:
>>
>> In [2]: x = np.random.random((5000))
>>
>> In [3]: timeit np.min(x)
>> 10000 loops, best of 3: 24.8 µs per loop
>>
>> In [4]: timeit np.nanmin(x)
>> 10000 loops, best of 3: 136 µs per loop
>>
>> So, I'm not sure if that will happen.  One option is to use `nanmin`
>> by default, and to provide `min` for people who need the speed.  The
>> fact that results with nan's are almost always unexpected is certainly
>> a valid concern.
>>
>> Cheers
>> Stéfan
>> _______________________________________________
>> Numpy-discussion mailing list
>> Numpy-discussion at scipy.org
>> http://projects.scipy.org/mailman/listinfo/numpy-discussion
>
>
> _______________________________________________
> Numpy-discussion mailing list
> Numpy-discussion at scipy.org
> http://projects.scipy.org/mailman/listinfo/numpy-discussion
>



More information about the NumPy-Discussion mailing list