[Numpy-discussion] What should be the value of nansum of nan's?

Charles R Harris charlesr.harris at gmail.com
Thu Apr 29 12:56:48 EDT 2010


On Wed, Apr 28, 2010 at 11:56 AM, T J <tjhnson at gmail.com> wrote:

> On Mon, Apr 26, 2010 at 10:03 AM, Charles R Harris
> <charlesr.harris at gmail.com> wrote:
> >
> >
> > On Mon, Apr 26, 2010 at 10:55 AM, Charles R Harris
> > <charlesr.harris at gmail.com> wrote:
> >>
> >> Hi All,
> >>
> >> We need to make a decision for ticket #1123 regarding what nansum should
> >> return when all values are nan. At some earlier point it was zero, but
> >> currently it is nan, in fact it is nan whatever the operation is. That
> is
> >> consistent, simple and serves to mark the array or axis as containing
> all
> >> nans. I would like to close the ticket and am a bit inclined to go with
> the
> >> current behaviour although there is an argument to be made for returning
> 0
> >> for the nansum case. Thoughts?
> >>
> >
> > To add a bit of context, one could argue that the results should be
> > consistent with the equivalent operations on empty arrays and always be
> > non-nan.
> >
> > In [1]: nansum([])
> > Out[1]: nan
> >
> > In [2]: sum([])
> > Out[2]: 0.0
> >
>
> This seems like an obvious one to me.  What is the spirit of nansum?
>
> """
>    Return the sum of array elements over a given axis treating
>    Not a Numbers (NaNs) as zero.
> """
>
> Okay.  So NaNs in an array are treated as zeros and the sum is
> performed as one normally would perform it starting with an initial
> sum of zero.  So if all values are NaN, then we add nothing to our
> original sum and still return 0.
>
> I'm not sure I understand the argument that it should return NaN.  It
> is counter to the *purpose* of nansum.   Also, if one wants to
> determine if all values in an array are NaN, isn't there another way?
> Let's keep (or make) those distinct operations, as they are definitely
> distinct concepts.
> __
>

It looks like the consensus is that zero should be returned. This is a
change from current behaviour and that bothers me a bit. Here are some other
oddities

In [6]: nanmax([nan])
Out[6]: nan

In [7]: nanargmax([nan])
Out[7]: nan

In [8]: nanargmax([1])
Out[8]: 0

So it looks like the current behaviour is very much tilted towards nans as
missing data flags. I think we should just leave that as is with perhaps a
note in the docs to that effect. The decision here should probably
accommodate the current users of these functions, of which I am not one. If
we leave the current behaviour as is then I think the rest of the nan
functions need fixes to return nan for empty sequences as nansum is the only
one that currently does that.

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20100429/738071a0/attachment.html>


More information about the NumPy-Discussion mailing list