[Numpy-discussion] the mean, var, std of empty arrays

josef.pktd at gmail.com josef.pktd at gmail.com
Thu Nov 22 09:54:28 EST 2012


On Thu, Nov 22, 2012 at 7:14 AM, Sebastian Berg
<sebastian at sipsolutions.net> wrote:
> On Wed, 2012-11-21 at 22:58 -0500, josef.pktd at gmail.com wrote:
>> On Wed, Nov 21, 2012 at 10:35 PM, Charles R Harris
>> <charlesr.harris at gmail.com> wrote:
>> >
>> >
>> > On Wed, Nov 21, 2012 at 7:45 PM, <josef.pktd at gmail.com> wrote:
>> >>
>> >> On Wed, Nov 21, 2012 at 9:22 PM, Olivier Delalleau <shish at keba.be> wrote:
>> >> > Current behavior looks sensible to me. I personally would prefer no
>> >> > warning
>> >> > but I think it makes sense to have one as it can be helpful to detect
>> >> > issues
>> >> > faster.
>> >>
>> >> I agree that nan should be the correct answer.
>> >> (I gave up trying to define a default for 0/0 in scipy.stats ttests.)
>> >>
>> >> some funnier cases
>> >>
>> >> >>> np.var([1], ddof=1)
>> >> 0.0
>> >
>> >
>> > This one is a nan in development.
>> >
>> >>
>> >> >>> np.var([1], ddof=5)
>> >> -0
>> >> >>> np.var([1,2], ddof=5)
>> >> -0.16666666666666666
>> >> >>> np.std([1,2], ddof=5)
>> >> nan
>> >>
>> >
>> > These still do this. Also
>> >
>> > In [10]: var([], ddof=1)
>> > Out[10]: -0
>> >
>> > Which suggests that the nan is pretty much an accidental byproduct of
>> > division by zero. I think it might make sense to have a definite policy for
>> > these corner cases.
>>
>> It would also be consistent with the usual pattern to raise a
>> ValueError on this. ddof too large, size too small.
>> It wouldn't be the case that for some columns or rows we get valid
>> answers in this case, as long as we don't allow for missing values.
>>
>
> It seems to me that nan is the reasonable result for these operations
> (reduce like operations that do not have an identity). Though actually
> reduce operations without an identity throw a ValueError (ie.
> `np.minimum.reduce([])`), but then mean/std/var seem special enough to
> be different from other reduce operations (for example their result is
> always floating point). As for usability I think for example when
> plotting errorbars using std, it would be rather annoying to get a
> ValueError, so if anything the reduce machinery could give more special
> results for empty floating point reductions.
>
> In any case the warning should be clearer and for too large ddof's I
> would say it should return nan+Warning as well.


Why don't operations on empty arrays not return empty arrays?

but this looks ok

>>> (np.array([]) - np.array([]).mean()) / np.array([]).std()
array([], dtype=float64)
>>> (np.array([]) - np.array([]).mean()) / np.array([]).std(0)
array([], dtype=float64)

>>> (np.array([]) - np.array([]).mean(0)) / np.array([]).std(0)
array([], dtype=float64)
>>> (np.array([]) - np.array([]).mean(0)) / np.array([])
array([], dtype=float64)

>>> np.array([[]]) - np.expand_dims(np.array([[]]).mean(1),1)
array([], shape=(1, 0), dtype=float64)

>>> np.array([[]]) - np.expand_dims(np.array([]),1)
array([], shape=(0, 0), dtype=float64)

>>> np.array([]) - np.expand_dims(np.array([]),0)
array([], shape=(1, 0), dtype=float64)


(But I doubt I will rely in many cases on correct "calculations" with
empty arrays.)

Josef


>
> Sebastian
>
>>
>> quick check with np.ma
>>
>> looks correct except when delegating to numpy ?
>>
>> >>> s = np.ma.var(np.ma.masked_invalid([[1.,2],[1,np.nan]]), ddof=5, axis=0)
>> >>> s
>> masked_array(data = [-- --],
>>              mask = [ True  True],
>>        fill_value = 1e+20)
>>
>> >>> s = np.ma.var(np.ma.masked_invalid([[1.,2],[1,np.nan]]), ddof=1, axis=0)
>> >>> s
>> masked_array(data = [0.0 --],
>>              mask = [False  True],
>>        fill_value = 1e+20)
>>
>> >>> s = np.ma.std([1,2], ddof=5)
>> >>> s
>> masked
>> >>> type(s)
>> <class 'numpy.ma.core.MaskedConstant'>
>>
>> >>> np.ma.var([1,2], ddof=5)
>> -0.16666666666666666
>>
>>
>> Josef
>>
>> >
>> > <snip>
>> >
>> > Chuck
>> >
>> >
>> > _______________________________________________
>> > NumPy-Discussion mailing list
>> > NumPy-Discussion at scipy.org
>> > http://mail.scipy.org/mailman/listinfo/numpy-discussion
>> >
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion



More information about the NumPy-Discussion mailing list