[Numpy-discussion] numpy.trapz() doesn't respect subclass

Ryan May rmay31 at gmail.com
Mon Mar 29 11:17:25 EDT 2010


On Mon, Mar 29, 2010 at 8:00 AM, Bruce Southey <bsouthey at gmail.com> wrote:
> On 03/27/2010 01:31 PM, Ryan May wrote:
>> Because of the call to asarray(), the mask is completely discarded and
>> you end up with identical results to an unmasked array,
>> which is not what I'd expect.  Worse, the actual numeric value of the
>> positions that were masked affect the final answer. My patch allows
>> this to work as expected too.
>>
> Actually you should assume that unless it is explicitly addressed
> (either by code or via a test), any subclass of ndarray (matrix, masked,
> structured, record and even sparse) may not provide a 'valid' answer.
> There are probably many numpy functions that only really work with the
> standard ndarray. Most of the time people do not meet these with the
> subclasses or have workarounds so there has been little requirement to
> address this especially due to the added overhead needed for checking.

It's not that I'm surprised that masked arrays don't work. It's more
that the calls to np.asarray within trapz() have been held up as being
necessary for things like matrices and (at the time) masked arrays to
work properly; as if calling asarray() is supposed to make all
subclasses work, though at a base level by dropping to an ndarray. To
me, the current behavior with masked arrays is worse than if passing
in a matrix raised an exception.  One is a silently wrong answer, the
other is a big error that the programmer can see, test, and fix.

> Also, any patch that does not explicitly define the assumed behavior
> with points that are masked  has to be rejected. It is not even clear
> what the expected behavior is for masked arrays should be:
> Is it even valid for trapz to be integrating across the full range if
> there are missing points? That implies some assumption about the missing
> points.
> If is valid, then should you just ignore the masked values or try to
> predict the missing values first? Perhaps you may want to have the
> option to do both.

You're right, it doesn't actually work with MaskedArrays as it stand
right now, because it calls add.reduce() directly instead of using the
array.sum() method. Once fixed, by allowing MaskedArray to handle the
operation, you end up not integrating over the masked region. Any
operation involving masked points results in contributions by masked
points are ignored.  I guess it's as if you assumed the function was 0
over the masked region.  If you wanted to ignore the masked points,
but integrate over the region (making a really big trapezoid over that
region), you could just pass in the .compressed() versions of the
arrays.

>> than implicit") It just seems absurd that if I make my own ndarray
>> subclass that *just* adds some behavior to the array, but doesn't
>> break *any* operations, I need to do one of the following:
>>
>> 1) Have my own copy of trapz that works with my class
>> 2) Wrap every call to numpy's own trapz() to put the metadata back.
>>
>> Does it not seem backwards that the class that breaks conventions
>> "just works" while those that don't break conventions, will work
>> perfectly with the function as written, need help to be treated
>> properly?
>>
> You need your own version of trapz or whatever function because it has
> the behavior that you expect. But a patch should not break numpy so you
> need to at least to have a section that looks for masked array subtypes
> and performs the desired behavior(s).

I'm not trying to be difficult but it seems like there are conflicting
ideas here: we shouldn't break numpy, which in this case means making
matrices no longer work with trapz().  On the other hand, subclasses
can do a lot of things, so there's no real expectation that they
should ever work with numpy functions in general.  Am I missing
something here? I'm just trying to understand what I perceive to be
some inconsistencies in numpy's behavior and, more importantly,
convention with regard subclasses.

Ryan

-- 
Ryan May
Graduate Research Assistant
School of Meteorology
University of Oklahoma



More information about the NumPy-Discussion mailing list