[Python-ideas] NAN handling in the statistics module

David Mertz mertz at gnosis.cx
Wed Jan 9 00:49:46 EST 2019


On Tue, Jan 8, 2019 at 11:57 PM Tim Peters <tim.peters at gmail.com> wrote:

> I'd like to see internal consistency across the central-tendency
> statistics in the presence of NaNs.  What happens now:
>

I think consistent NaN-poisoning would be excellent behavior.  It will
always make sense for median (and its variants).

>>> statistics.mode([2, 2, nan, nan, nan])
> nan
> >>> statistics.mode([2, 2, inf - inf, inf - inf, inf - inf])
> 2
>

But in the mode case, I'm not sure we should ALWAYS treat a NaN as
poisoning the result.  If NaN means "missing value" then sometimes it could
change things, and we shouldn't guess.  But what if it cannot?

    >>> statistics.mode([9, 9, 9, 9, nan1, nan2, nan3])

No matter what missing value we take those nans to maybe-possibly
represent, 9 is still the most common element.  This is only true when the
most common thing occurs at least as often as the 2nd most common thing
PLUS the number of all NaNs.  But in that case, 9 really is the mode.

We have one example of non-poisoning NaN in basic operations:

    >>> nan**0
    1.0

So if the NaN "cannot possibly change the answer" then its reasonable to
produce a non-NaN answer IMO.  Except we don't really get that with 0**nan
or 0*nan already... so a NaN-poisoning mode wouldn't actually offend my
sensibilities that much. :-).

I guess you could argue that NaN "could be inf".  In that case 0*nan being
nan makes sense.  But this still feels hard to slightly odd:

    >>> 0**inf
    0.0
    >>> 0**nan
    nan

I guess it's supported by:

    >>> 0**-1
    ZeroDivisionError: 0.0 cannot be raised to a negative power

A *missing value* could be a negative one.
-- 
Keeping medicines from the bloodstreams of the sick; food
from the bellies of the hungry; books from the hands of the
uneducated; technology from the underdeveloped; and putting
advocates of freedom in prisons.  Intellectual property is
to the 21st century what the slave trade was to the 16th.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20190109/8b46c8ce/attachment.html>


More information about the Python-ideas mailing list