[Python-ideas] NAN handling in the statistics module

Steven D'Aprano steve at pearwood.info
Wed Jan 9 00:19:36 EST 2019


On Mon, Jan 07, 2019 at 11:27:22AM +1100, Steven D'Aprano wrote:

[...]
> I propose adding a "nan_policy" keyword-only parameter to the relevant 
> statistics functions (mean, median, variance etc), and defining the 
> following policies:


I asked some heavy users of statistics software (not just Python users) 
what behaviour they would find useful, and as I feared, I got no 
conclusive answer. So far, the answers seem to be almost evenly split 
into four camps:

- don't do anything, it is the caller's responsibility to filter NANs;

- raise an immediate error;

- return a NAN;

- treat them as missing data.


(Currently it is a small sample size, so I don't expect the 
answers will stay evenly split if more people answer.)

On consideration of all the views expressed, thank you to everyone who 
commented, I'm now inclined to default to returning a NAN (which happens 
to be the current behaviour of mean etc, but not median except by 
accident) even if it impacts performance.




-- 
Steve


More information about the Python-ideas mailing list