[Numpy-discussion] Missing/accumulating data

Fri Jul 1 17:23:34 EDT 2011

On Fri, Jul 1, 2011 at 2:29 PM, Joe Harrington <jh at physics.ucf.edu> wrote:

> Mark Wiebe <mwwiebe at gmail.com>:
>
> > With a non-boolean alpha mask, there's an implication of a
> > multiplication operator in there somewhere, but with a boolean mask,
> > the data can be any data whatsoever that doesn't necessarily support
> > any kind of blending operations.
>
> My goal in raising the point is to find a common core that supports
> everything.  The benefit of the np.ma module is that you have
> traditional numerical routines like median() and mean() that now
> sensibly handle missing data, plus a data structure (the paired array
> and mask) that you can use for other things of your own devising.  All
> that has to happen is to allow the sense of the mask to be FALSE = the
> data are bad, TRUE = the data are good, and allow (not require) the
> mask to be of any numerical type, or at least of integer type as well
> as boolean.  I believe that with these two basic requirements,
> everyone's needs can be met.  Note that you could still have boolean
> masks, and could still have the bad=TRUE, good=FALSE of the current
> np.ma module, if you had a flag to set in the dtype for what sense of
> the mask you wanted.  It could default to the current behavior if that
> makes people happy/breaks the least code.
>
> > For the image accumulation you're describing, I would use either a
> > structured array with 'color' and 'weight' fields, or have the last
> > element of the color channel be the weight (like an RGBA image) so
> > adding multiple weighted images together would add both the colors
> > and the weights simultaneously, without requiring a ufunc extension
> > supporting struct dtypes.
>
> Well, yes, we can always design a new data structure that meets our
> needs, and write all the routines that will ever operate on them.  But
> we don't want that.  We want to add a feature to the *old* data
> structure (i.e., a numerical array of the basic data) that makes the
> standard routines handle missing data sensibly so we don't have to
> rewrite them to do so.
>

I've used this style of weighted image masking quite a bit, but I think it
doesn't quite fit with the discrete nature of the NA missing value concepts.
The NA idea works with any dtype, like datetime, but 50% of a datetime isn't
a reasonable concept, hurting the idea of general dtypes + alpha masking.
It's also incompatible with the SAS or Stata-style multiple NA values idea.

-Mark

>
> --jh--
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20110701/9da0e95c/attachment.html>