[Numpy-discussion] Warnings in numpy.ma.test()

Fri Mar 19 09:37:12 EDT 2010

On Wed, Mar 17, 2010 at 10:16 PM, Charles R Harris
<charlesr.harris at gmail.com> wrote:
> On Wed, Mar 17, 2010 at 7:39 PM, Darren Dale <dsdale24 at gmail.com> wrote:
>> On Wed, Mar 17, 2010 at 8:22 PM, Charles R Harris
>> > What bothers me here is the opposing desire to separate ufuncs from
>> > their
>> > ndarray dependency, having them operate on buffer objects instead. As I
>> > see
>> > it ufuncs would be split into layers, with a lower layer operating on
>> > buffer
>> > objects, and an upper layer tying them together with ndarrays where the
>> > "business" logic -- kinds, casting, etc -- resides. It is in that upper
>> > layer that what you are proposing would reside. Mind, I'm not sure that
>> > having matrices and masked arrays subclassing ndarray was the way to go,
>> > but
>> > given that they do one possible solution is to dump the whole mess onto
>> > the
>> > subtype with the highest priority. That subtype would then be
>> > responsible
>> > for casts and all the other stuff needed for the call and wrapping the
>> > result. There could be library routines to help with that. It seems to
>> > me
>> > that that would be the most general way to go. In that sense ndarrays
>> > themselves would just be another subtype with especially low priority.
>>
>> I'm sorry, I didn't understand your point. What you described sounds
>> identical to how things are currently done. What distinction are you
>> making, aside from operating on the buffer object? How would adding a
>> method to modify the input to a ufunc complicate the situation?
>>
>
> Just *one* function to rule them all and on the subtype dump it. No
> __array_wrap__, __input_prepare__, or __array_prepare__, just something like
> __handle_ufunc__. So it is similar but perhaps more radical. I'm proposing
> having the ufunc upper layer do nothing but decide which argument type will
> do all the rest of the work, casting, calling the low level ufunc base,
> providing buffers, wrapping, etc. Instead of pasting bits and pieces into
> the existing framework I would like to lay out a line of attack that ends up
> separating ufuncs into smaller pieces that provide low level routines that
> work on strided memory while leaving policy implementation to the subtype.
> There would need to be some default type (ndarray) when the functions are
> called on nested lists and scalars and I'm not sure of the best way to
> handle that.
>
> I'm just sort of thinking out loud, don't take it too seriously.

This is a seemingly simplified approach. I was taken with it last
night but then I remembered that it will make subclassing difficult. A
simple example can illustrate the problem. We have MaskedArray, which
needs to customize some functions that operate on arrays or buffers,
so we pass the function and the arguments to __handle_ufunc__ and it
takes care of the whole shebang. But now I develop a MaskedQuantity
that takes masked arrays and gives them the ability to handle units,
and so it needs to customize those functions further. Maybe
MaskedQuantity can modify the input passed to its __handle_ufunc__ and
then pass everything on to super().__handle_ufunc__, such that
MaskedQuantity does not have to reimplement MaskedArray's
customizations to that particular function, but that is not enough
flexibility for the general case. If a my subclass needs to call the
low-level ufunc base, it can't rely on the superclass.__handle_ufunc__
because it *also* calls the ufunc base, so my subclass has to
reimplement all of the superclass function customizations.

The current scheme (__input_prepare__, ...) is better able to handle
subclassing, although I agree that it could be improved. If the
subclasses were responsible for calling the ufunc base, alternative
bases could be provided (like the c routines for masked arrays). That
still seems to require the high-level function to provide three or
four entry points: 1) modify the input, 2) initialize the output
(chance to deal with metadata), 3) call the function base, 4) finalize
the output (deal with metadata that requires the ufunc results).
Perhaps 2 and 4 would not both be needed, I'm not sure.

Darren