[Numpy-discussion] behavior of masked arrays
Giorgio F. Gilestro
giorgio at gilestro.tk
Sun Mar 9 13:35:27 EDT 2008
Ok generic functions and a ma.stats specific module sounds very good to
me. Hope is going to happen for ma are a great plus.
Pierre, I did some adjusting to some of the functions in
scipy.stats.stats and more I am planning to do - not all but those I'll
need I am afraid. Is it ok if I send you what I'll have so that you have
a look at it (at your convenience) and maybe integrate it to
numpy.ma.mstats?
For the moment the only issues I met are:
- some functions require to know N, the number of elements on which we
are performing the operation. A simple N.shape[axis] won't work but
there is no native method returning the number of unmasked elements on a
given axis (maybe there should be?). So I am using instead
N = a.shape[axis] - a.mask.sum(axis)
- some functions need to handle float data. The float method on masked
array will raise an exception (why so?) so I am either introducing float
constant where possible
e.g. svar = ((n-1)*v) / float(df) becomes svar = ((n-1.0)*v) / df
or multiply by 1.0
Pierre GM wrote:
> On Friday 07 March 2008 12:25:13 Giorgio F. Gilestro wrote:
>> Ok, I see, thank you Pierre.
>> I thought scipy.stats would have been a widely used extension so I
>> didn't really consider the trivial possibility that simply wasn't
>> compatible with ma yet.
>
> Partly my fault here, as I should have ported more functions. <rant>Blame the
> fact that working on an open-source project doesn't translate in
> publications, and that my bosses are shortening the leash...</rant>.
> Note that most (all?) of the functions in scipy.stats never supported masked
> arrays in the first place anyway. Now that MaskedArray is just a subclass of
> ndarray, porting the functions should be easier.
>
>> I had a quick look at the code and it really seems that ma handling can
>> be achieved by replacing np.asarray with np.ma.asarray, and some
>> functions with their methods (like ravel) here and there.
>
> Yes and no. I'd prefer to use numpy.asanyarray as to avoid converting ndarrays
> to masked arrays, and use methods as much as possible. Of course, there's
> gonna be some particular cases to handle (as when all the data are masked),
> but that should be relatively painless.
>
> Another issue is where to store the new functions: should we try to ensure
> full compatibility of scipy.stats with masked arrays? Create a new module
> scipy.mstats instead, that we'd fill up with time ? I'd be keener on the
> second approach, as we could move most of the functions currently in
> numpy.ma.m(ore)stats to this new module, and that'd probably less work at
> once...
> _______________________________________________
> Numpy-discussion mailing list
> Numpy-discussion at scipy.org
> http://projects.scipy.org/mailman/listinfo/numpy-discussion
More information about the NumPy-Discussion
mailing list