[Numpy-discussion] alterNEP - was: missing data discussion round 2

Fri Jul 1 12:29:11 EDT 2011

On Fri, Jul 1, 2011 at 11:20 AM, Matthew Brett <matthew.brett at gmail.com>wrote:

> Hi,
>
> On Fri, Jul 1, 2011 at 5:17 PM, Benjamin Root <ben.root at ou.edu> wrote:
> >
> >
> > On Fri, Jul 1, 2011 at 11:00 AM, Matthew Brett <matthew.brett at gmail.com>
> > wrote:
> >>
> >> > You can't switch between the two approaches without big changes in
> your
> >> > code.
> >>
> >> >
> >> Lluis provided a case, and it was obscure.  That switch seems like a
> >> rare or non-existent use-case that should not guide the API.
> >>
> >
> > Just to respond to this specific issue.
> >
> > In matplotlib, there are often constructs like the following:
> >
> > plot_something(X, Y, V)
> >
> > From a module perspective, we have no clue about the nature of the input
> > data.  We often have to do things like np.asanyarray, np.atleast_2d and
> such
> > to establish some base-level assumptions about the input data.  Numpy
> > currently makes this fairly cheap by not performing a copy if it is not
> > needed.  So far, so good.
> >
> > Next, some plotting functions needs to broadcast the arrays together
> (again,
> > numpy makes that fairly cheap).
> >
> > Then, we need to figure out the common elements to plot.  With something
> > simple like plot(), this is straight-forward or-ing of any masks.  Of
> > course, right now, this is not cheap because we can't assume that the
> array
> > supports masking semantics.  This is where we either cast the arrays as
> > masked arrays, or perform our own masking semantics.  But, essentially, a
> > point that was masked in X, may not be masked in Y and/or V, and we can
> not
> > change the original data (or else we would be a bad tool).
> >
> > For more complicated functions like pcolor() and contour(), the arrays
> needs
> > to know what the status of the neighboring points in itself, and for the
> > other arrays.  Again, either we use numpy.ma to share a common mask
> across
> > the data arrays, or we implement our own semantics to deal with this.
> And
> > again, we can not change any of the original data.
> >
> > This is not an obscure case.  This is existing code in matplotlib.  I
> will
> > be evaluating the current missingdata branch later today to assess its
> > suitability for use in matplotlib.
>
> I think I missed why your case needs NA and IGNORE to use the same
> API.  Why can't you just use masks and IGNORE here?
>
> Best,
>
> Matthew
>

The point is that matplotlib can not make assumptions about the nature of
the input data.  From matplotlib's perspective, NA's and IGNORE's are the
same thing and should be treated the same way (i.e. - skipped).  Right now,
matplotlib's code is messy and inconsistent with its treatment of masked
arrays and NaNs (some functions treat them the same, some only apply to NaNs
and vice versa).  This is because of code cruft over the years.  If we had
one interface to rule them all, we can bring *all* plotting functions to
have similar handling code and be more consistent across the board.

However, I think Mark's NEP provides a good way to distinguish between the
cases when needed (but I have not examined it from that perspective yet).

Ben Root
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20110701/8d57c5b3/attachment.html>