[Numpy-discussion] missing data discussion round 2

Mon Jun 27 18:24:03 EDT 2011

On Jun 27, 2011, at 9:59 PM, josef.pktd at gmail.com wrote:
> 
> Just a question how things would work with the new model.
> How can you implement the "use" keyword from R's cov (or cor), with
> minimal data copying
> 
> I think the basic masked array version would (or does) just assign 0
> to the missing values calculate the covariance or correlation and then
> correct with the correct count. 

Basically, yes. Basic operations have a generic internal fill value (0 for sum/subtraction, 1 for multiplication/division), then you just have to correct by the count.

> 
> especially I'm interested in the complete.obs (drop any rows that
> contains a NA) case

In numpy.ma, there are functions to drop rows/columns that contain a masked value (they are in numpy.ma.extras, if I recall correctly): just filter your data by these functions before parsing it to np.cov. That's the kind of trivial example that is probably not worth overloading a function with optional parameters for.