[SciPy-dev] Statistics toolbox and nans

Travis Oliphant oliphant.travis at ieee.org
Fri Nov 1 00:57:20 EST 2002


Hello developers.

What should we do about nan's and the stats toolbox.  Stats is one
package where people may use nans to represent missing values.


There are two options that I see. 

1) MATLAB option

MATLAB defines 6 new functions nanmean, nanmedian, nansum, nanmin,
nanmax, and nanstd that ignore nans properly.  These can be used in
place of the normal functions which don't use nans properly.  Perhaps
they did this as an afterthought.

Note, this is an easy option and is (as of now) implemented in the CVS
scipy.

Other stats functions may or may not handle nan's properly.  

2) Integrated option

All stats functions handle nan's properly


The drawback to Option 2 which is less difficult to explain is that
every function is saddled with isnan checking which may slow things down
some. 

Using Knuth's policy of not optimizing prematurely.  I tend toward
number 2.  

Are there any other options anybody sees.

Thanks,

-Travis O.









More information about the SciPy-Dev mailing list