[Numpy-discussion] Re: [SciPy-user] Messing with missing values

pgmdevlist at mailcan.com pgmdevlist at mailcan.com
Sun Feb 26 21:20:03 EST 2006


On Sunday 26 February 2006 14:19, Sasha wrote:
> I am replying on "numpy-discussion"  because this is really a numpy
> rather than scipy topic.
My bad, sorry for that.

> > Unfortunately, most of the numpy/scipy functions don't handle missing
> > values nicely.
>
> Can you specify which *numpy* functions are giving you trouble?
> That should be fixed.

Typical examples: median, stdev, diff... `stdev` is obvious, `median` 
straightforward for 1d arrays (and I'm still looking for an optimal method 
for higher dimension). The couple of `shape_base` functions I tried 
(`hstack`, `column_stack`..) required to fill the array beforehand, and 
superimposing the corresponding mask. 
Or even some methods such as `ndim` (more for convenience than anything, a 
`len(x.shape)` does the trick for both masked & unmasked versions), or r_[].

I remmbr a message a couple of weeks ago wondering whether ma should be kpet 
uptodate with the rest of numpy (and of course, I can't find the reference 
right now). What's the status on ma ? 

> > How could I mask the values corresponding to
> > MA.masked in the final list, without having to check every single
> > element?
>
> Latest ma allows you to pass masked arrays directly to ufuncs. In
> order for this to work a ufunc should be registered in the "domains"
> and "fills" dictionaries.  Not much documentation on this feature
> exists yet, so you will have to read the code in ma.py to figure this
> out.

Let's take the `median` example for 2D arrays. I end up with something like:
---
med = []
for x_i in x:
   med.append(median1d(x_i.compressed())
---
with `median1d` a slightly modified version of the basic numpy `median`, 
outputing `MA.masked` if `x_i.compressed()` is `None`. I need the `med` list 
to be a masked_array. Paul Dubois suggests:
---
return ma.array(med, mask=[x is ma.masked for x in med])
---
I guess that's more efficient than the 
---
return MA.masked_values(med.filled(nodata),nodata)
---
I had come up with. AAMOF, it seems even faster to hardcode the `median1d` 
part in the loop.

But yes, I gonna check the sources for the ufunc.

Thanks again.


-- 
Pierre GM




More information about the NumPy-Discussion mailing list