[Python-ideas] NAN handling in the statistics module
Steven D'Aprano
steve at pearwood.info
Tue Jan 8 05:56:20 EST 2019
On Tue, Jan 08, 2019 at 04:25:17PM +0900, Stephen J. Turnbull wrote:
> Steven D'Aprano writes:
>
> > By definition, data containing Not A Number values isn't numeric :-)
>
> Unfortunately, that's just a joke, because in fact numeric functions
> produce NaNs.
I'm not sure if you're agreeing with me or disagreeing, so I'll assume
you're agreeing and move on :-)
> I agree that this can easily be resolved by documenting that it is the
> caller's responsibility to remove NaNs from numeric data, but I prefer
> your proposed flags.
>
> > The only reason why I don't call it a bug is that median() makes no
> > promises about NANs at all, any more than it makes promises about the
> > median of a list of sets or any other values which don't define a total
> > order.
>
> Pedantically, I would prefer that the promise that ordinal data
> (vs. specifically numerical) has a median be made explicit, as there
> are many cases where statistical data is ordinal.
I think that is reasonable.
Provided the data defines a total order, the median is well-defined when
there are an odd number of data points, or you can use median_low and
median_high regardless of the number of data points.
> This may be a moot
> point, as in most cases ordinal data is represented numerically in
> computation (Likert scales, for example, are rarely coded as "hate,
> "dislike", "indifferent", "like", "love", but instead as 1, 2, 3, 4,
> 5), and from the point of view of UI presentation, IntEnums do the
> right thing here (print as identifiers, sort as integers).
>
> Perhaps a better way to document this would be to suggest that ordinal
> data be represented using IntEnums? (Again to be pedantic, one might
> want OrderedEnums that can be compared but don't allow other
> arithmetic operations.)
That's a nice solution.
--
Steve (the other one)
More information about the Python-ideas
mailing list