[Numpy-discussion] behavior of masked arrays

Giorgio F. Gilestro giorgio at gilestro.tk
Sun Mar 9 13:35:27 EDT 2008


Ok generic functions and a ma.stats specific module sounds very good to 
me. Hope is going to happen for ma are a great plus.

Pierre, I did some adjusting to some of the functions in 
scipy.stats.stats and more I am planning to do - not all but those I'll 
need I am afraid. Is it ok if I send you what I'll have so that you have 
a look at it (at your convenience) and maybe integrate it to 
numpy.ma.mstats?

For the moment the only issues I met are:

- some functions require to know N, the number of elements on which we 
are performing the operation. A simple N.shape[axis] won't work but 
there is no native method returning the number of unmasked elements on a 
given axis (maybe there should be?). So I am using instead

N = a.shape[axis] - a.mask.sum(axis)

- some functions need to handle float data. The float method on masked 
array will raise an exception (why so?) so I am either introducing float 
constant where possible

e.g. svar = ((n-1)*v) / float(df) becomes svar = ((n-1.0)*v) / df

or multiply by 1.0




Pierre GM wrote:
> On Friday 07 March 2008 12:25:13 Giorgio F. Gilestro wrote:
>> Ok, I see, thank you Pierre.
>> I thought scipy.stats would have been a widely used extension so I
>> didn't really consider the trivial possibility that simply wasn't
>> compatible with ma yet.
> 
> Partly my fault here, as I should have ported more functions. <rant>Blame the 
> fact that working on an open-source project doesn't translate in 
> publications, and that my bosses are shortening the leash...</rant>. 
> Note that most (all?) of the functions in scipy.stats never supported masked 
> arrays in the first place anyway. Now that MaskedArray is just a subclass of 
> ndarray, porting the functions should be easier.
> 
>> I had a quick look at the code and it really seems that ma handling can
>> be achieved by replacing np.asarray with np.ma.asarray, and some
>> functions with their methods (like ravel) here and there.
> 
> Yes and no. I'd prefer to use numpy.asanyarray as to avoid converting ndarrays 
> to masked arrays, and use methods as much as possible. Of course, there's 
> gonna be some particular cases to handle (as when all the data are masked), 
> but that should be relatively painless.
> 
> Another issue is where to store the new functions: should we try to ensure 
> full compatibility of scipy.stats with masked arrays? Create a new module 
> scipy.mstats instead, that we'd fill up with time ? I'd be keener on the 
> second approach, as we could move most of the functions currently in 
> numpy.ma.m(ore)stats to this new module, and that'd probably less work at 
> once...
> _______________________________________________
> Numpy-discussion mailing list
> Numpy-discussion at scipy.org
> http://projects.scipy.org/mailman/listinfo/numpy-discussion




More information about the NumPy-Discussion mailing list