[Numpy-discussion] using the same vocabulary for missing value ideas

Wed Jul 6 14:56:01 EDT 2011

On Wed, Jul 6, 2011 at 12:41 PM, Pierre GM <pgmdevlist at gmail.com> wrote:

>  Ah, semantics...
>
> On Jul 6, 2011, at 5:40 PM, Mark Wiebe wrote:
> >
> > NA (Not Available)
> >     A placeholder for a value which is unknown to computations. That
> >     value may be temporarily hidden with a mask, may have been lost
> >     due to hard drive corruption, or gone for any number of reasons.
> >     This is the same as NA in the R project.
>
> I have a problem with 'temporarily hidden with a mask'. In my mind, the
> concept of NA carries a notion of perennation. The data is just not
> available, just as a NaN is just not a number.
>

Yes, this gets directly to what I've been meaning when I say NA vs IGNORE is
independent of mask vs bitpattern. The way I'm trying to structure things,
NA vs IGNORE only affects the semantic meaning, i.e. the outputs produced by
computations. This is precisely why I put 'temporarily hidden with a mask'
first, to make that more clear.

> > IGNORE (Skip/Ignore)
> >     A placeholder which should be treated by computations as if no value
> does
> >     or could exist there. For sums, this means act as if the value
> >     were zero, and for products, this means act as if the value were one.
> >     It's as if the array were compressed in some fashion to not include
> >     that element.
>
> A data temporarily hidden by a mask becomes np.IGNORE.
>

Are you willing to suspend the idea of that implication for the purposes of
the present discussion? If not, do you see a way to amend things so that
masked NAs and bitpattern-based IGNOREs make sense? Would renaming IGNORE to
SKIP be more clear, perhaps?

Thanks,
Mark

>
>
> > bitpattern
> >     A technique for implementing either NA or IGNORE, where a particular
> >     set of bit patterns are chosen from all the possible bit patterns of
> the
> >     value's data type to signal that the element is NA or IGNORE.
> >
> > mask
> >     A technique for implementing either NA or IGNORE, where a
> >     boolean or enum array parallel to the data array is used to signal
> >     which elements are NA or IGNORE.
> >
> > numpy.ma
> >     The existing implementation of a particular form of masked arrays,
> >     which is part of the NumPy codebase.
>
> OK with that.
>
>
>
> >
> > The most important distinctions I'm trying to draw are:
> >
> > 1) NA vs IGNORE and bitpattern vs mask are completely independent. Any
> combination of NA as bitpattern, NA as mask, IGNORE as bitpattern, and
> IGNORE as mask are reasonable.
>
> OK with that.
>
>
>
> > 2) The idea of masking and the numpy.ma implementation are different.
> The numpy.ma object makes particular choices about how to interpret the
> mask, but while backwards compatibility is important, a fresh evaluation of
> all the design choices going into a mask implementation is worthwhile.
>
> Indeed.
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20110706/50a84d5b/attachment.html>