[Numpy-discussion] in the NA discussion, what can we agree on?

Benjamin Root ben.root at ou.edu
Fri Nov 4 10:48:56 EDT 2011


On Friday, November 4, 2011, Gary Strangman <strang at nmr.mgh.harvard.edu>
wrote:
>
>> > non-destructive+propagating -- it really depends on exactly what
>> > computations you want to perform, and how you expect them to work. The
>> > main difference is how reduction operations are treated. I kind of
>> > feel like the non-propagating version makes more sense overall, but I
>> > don't know if there's any consensus on that.
>>
>> I think this is further evidence for my idea that a mask should not be
>> undone, but is non destructive.  If you want to be able to access the
values
>> after masking, have a view, or only apply the mask to a view.
>
> OK, so my understanding of what's meant by propagating is probably
incomplete (and is definitely still fuzzy). I'm a little confused by the
phrase "a mask should not be undone" though. Say I want to perform a
statistical analysis or filtering procedure excluding and (separately)
including a handful of outliers? Isn't that a natural case for undoing a
mask? Or did you mean something else?
>
> I think I understand the "use a view" option above, though I don't see
how one could apply a mask only to a view. What if my view is every other
row in a 2D array, and I want to mask the last half of this view? What is
the state of the original array once the mask has been applied?
>
> (If this is derailing the progress of this thread, feel free to ignore
it.)
>
> -best
> Gary

Ufuncs can be broadly categorized as element-wise (binary ops like +, *,
etc) as well as regular functions that return an array with a shape that
matches the inputs broadcasted together.  And reduction ops (sum, min,
mean, etc).

For element-wise, things are a bit murky for IGNORE, and I defer to Mark's
NEP:
https://github.com/numpy/numpy/blob/master/doc/neps/missing-data.rst#id17,
and it probably should be expanded and clarified in the NEP.

For reduction ops, propagation means that sum([3 5 NA 6]) == NA, just like
if you had a NaN in the array. Non-propagating (or skipping or ignore)
would have that operation produce 14.  A mean() for the propagating case
would be NA, but 4.6666 for non-propagating.

The part about undoing a mask is addressing the issue of when an operation
produces a new array that has ignored elements in it, then those elements
never were initialized with any value at all.  Therefore, "unmasking" those
elements and accessing their values make no sense. This and more are
covered in this section of the NEP:
https://github.com/numpy/numpy/blob/master/doc/neps/missing-data.rst#id11

For your stated case, I would have two views of the data (or at least the
original data and a view of it).  For the view, I would apply the mask to
hide the outliers from the filtering operation and produce a result.  The
first view (or the original array) sees the same data as it did before the
other view took on a mask, so you can perform the filtering operation on
the data and have two separate results. You can keep the masked view for
subsequent calculations, and/or keep the original view, and/or create new
views with new masks for other analyzes, all while keeping the original
data intact.

Note that I am right now speaking of views in a somewhat more abstract
sense that is only loosely tied to numpy's specific behavior with respect
to views right now.  As for np.view() in specific, that is an
implementation detail that probably shouldn't be in this thread yet, so
don't hook too much onto it.

Cheers!
Ben Root
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20111104/9c9e0bf4/attachment.html>


More information about the NumPy-Discussion mailing list