[Numpy-discussion] What should be the result in some statistics corner cases?

Sun Jul 14 16:55:08 EDT 2013

On 7/14/13, Charles R Harris <charlesr.harris at gmail.com> wrote:
> Some corner cases in the mean, var, std.
>
> *Empty arrays*
>
> I think these cases should either raise an error or just return nan.
> Warnings seem ineffective to me as they are only issued once by default.
>
> In [3]: ones(0).mean()
> /home/charris/.local/lib/python2.7/site-packages/numpy/core/_methods.py:61:
> RuntimeWarning: invalid value encountered in double_scalars
>   ret = ret / float(rcount)
> Out[3]: nan
>
> In [4]: ones(0).var()
> /home/charris/.local/lib/python2.7/site-packages/numpy/core/_methods.py:76:
> RuntimeWarning: invalid value encountered in true_divide
>   out=arrmean, casting='unsafe', subok=False)
> /home/charris/.local/lib/python2.7/site-packages/numpy/core/_methods.py:100:
> RuntimeWarning: invalid value encountered in double_scalars
>   ret = ret / float(rcount)
> Out[4]: nan
>
> In [5]: ones(0).std()
> /home/charris/.local/lib/python2.7/site-packages/numpy/core/_methods.py:76:
> RuntimeWarning: invalid value encountered in true_divide
>   out=arrmean, casting='unsafe', subok=False)
> /home/charris/.local/lib/python2.7/site-packages/numpy/core/_methods.py:100:
> RuntimeWarning: invalid value encountered in double_scalars
>   ret = ret / float(rcount)
> Out[5]: nan
>
> *ddof >= number of elements*
>
> I think these should just raise errors. The results for ddof >= #elements
> is happenstance, and certainly negative numbers should never be returned.
>
> In [6]: ones(2).var(ddof=2)
> /home/charris/.local/lib/python2.7/site-packages/numpy/core/_methods.py:100:
> RuntimeWarning: invalid value encountered in double_scalars
>   ret = ret / float(rcount)
> Out[6]: nan
>
> In [7]: ones(2).var(ddof=3)
> Out[7]: -0.0
> *
> nansum*
>
> Currently returns nan for empty arrays. I suspect it should return nan for
> slices that are all nan, but 0 for empty slices. That would make it
> consistent with sum in the empty case.
>

For nansum, I would expect 0 even in the case of all nans.  The point
of these functions is to simply ignore nans, correct?  So I would aim
for this behaviour:  nanfunc(x) behaves the same as func(x[~isnan(x)])

Warren

> Chuck
>