[Numpy-discussion] Warnings in numpy.ma.test()

Fri Mar 19 13:58:58 EDT 2010

On Mar 18, 2010, at 4:12 PM, Eric Firing wrote:
> Ryan May wrote:
>> On Thu, Mar 18, 2010 at 2:46 PM, Christopher Barker
>> <Chris.Barker at noaa.gov> wrote:
>>> Gael Varoquaux wrote:
>>>> On Thu, Mar 18, 2010 at 12:12:10PM -0700, Christopher Barker wrote:
>>>>> sure -- that's kind of my point -- if EVERY numpy array were
>>>>> (potentially) masked, then folks would write code to deal with them
>>>>> appropriately.
>>>> That's pretty much saying: "I have a complicated problem and I want every
>>>> one else to have to deal with the full complexity of it, even if they
>>>> have a simple problem".
>>> Well -- I did say it was a fantasy...
>>> 
>>> But I disagree -- having invalid data is a very common case. What we
>>> have now is a situation where we have two parallel systems, masked
>>> arrays and regular arrays. Each time someone does something new with
>>> masked arrays, they often find another missing feature, and have to
>>> solve that. Also, the fact that masked arrays are tacked on means that
>>> performance suffers.

Please keep in mind that MaskedArrays were always provided for convenience, that's all. If you need performance, you must implement a solution adapted to your problem (dropping missing values, filling them with some kind of interpolation...) and just use standard ndarrays.

Anyway, the plan was since the beginning  to have MaskedArrays implemented in C at one point or another. A few years back I checked how to subclass ndarrays in Cython, but ran into a lot of problems. Travis O advised me to focus on MaskedArrays instead, for good reasons. Now we have something that's pretty close to a ndarray (by opposition to the implementation in numeric), that works most of the time but could be optimized. 

>> Case in point, I just found a bug in np.gradient where it forces the
>> output to be an ndarray.
>> (http://projects.scipy.org/numpy/ticket/1435).  Easy fix that doesn't
>> actually require any special casing for masked arrays, just making
>> sure to use the proper function to create a new array of the same
>> subclass as the input.  However, now for any place that I can't patch
>> I have to use a custom function until a fixed numpy is released.
>> 
>> Maybe universal support for masked arrays (and masking invalid points)
>> is a pipe dream, but every function in numpy should IMO deal properly
>> with subclasses of ndarray.
> 
> 1) This can't be done in general because subclasses can change things to 
> the point where there is little one can count on.  The matrix subclass, 
> for example, redefines multiplication and iteration, making it difficult 
> to write functions that will work for ndarrays or matrices.

And one can always add a function to numpy.ma.extras...

> 
> 2) There is a lot that can be done to improve the handling of masked 
> arrays, and I still believe that much of it should be done at the C 
> level, where it can be done with speed and simplicity.  Unfortunately, 
> figuring out how to do it well, and implementing it well, will require a 
> lot of intensive work.  I suspect it won't get done unless we can figure 
> out how to get a qualified person dedicated to it.

I still can't speak C, but now that I'm unemployed, I should have plenty of free time to learn... Hire me ;)