[SciPy-dev] Statistics toolbox and nans
Travis Oliphant
oliphant.travis at ieee.org
Fri Nov 1 00:57:20 EST 2002
Hello developers.
What should we do about nan's and the stats toolbox. Stats is one
package where people may use nans to represent missing values.
There are two options that I see.
1) MATLAB option
MATLAB defines 6 new functions nanmean, nanmedian, nansum, nanmin,
nanmax, and nanstd that ignore nans properly. These can be used in
place of the normal functions which don't use nans properly. Perhaps
they did this as an afterthought.
Note, this is an easy option and is (as of now) implemented in the CVS
scipy.
Other stats functions may or may not handle nan's properly.
2) Integrated option
All stats functions handle nan's properly
The drawback to Option 2 which is less difficult to explain is that
every function is saddled with isnan checking which may slow things down
some.
Using Knuth's policy of not optimizing prematurely. I tend toward
number 2.
Are there any other options anybody sees.
Thanks,
-Travis O.
More information about the SciPy-Dev
mailing list