[Python-ideas] NAN handling in the statistics module
Steven D'Aprano
steve at pearwood.info
Wed Jan 9 00:19:36 EST 2019
On Mon, Jan 07, 2019 at 11:27:22AM +1100, Steven D'Aprano wrote:
[...]
> I propose adding a "nan_policy" keyword-only parameter to the relevant
> statistics functions (mean, median, variance etc), and defining the
> following policies:
I asked some heavy users of statistics software (not just Python users)
what behaviour they would find useful, and as I feared, I got no
conclusive answer. So far, the answers seem to be almost evenly split
into four camps:
- don't do anything, it is the caller's responsibility to filter NANs;
- raise an immediate error;
- return a NAN;
- treat them as missing data.
(Currently it is a small sample size, so I don't expect the
answers will stay evenly split if more people answer.)
On consideration of all the views expressed, thank you to everyone who
commented, I'm now inclined to default to returning a NAN (which happens
to be the current behaviour of mean etc, but not median except by
accident) even if it impacts performance.
--
Steve
More information about the Python-ideas
mailing list