[Numpy-discussion] NA masks in the next numpy release?

Matthew Brett matthew.brett at gmail.com
Fri Oct 28 16:02:04 EDT 2011


Hi,

On Fri, Oct 28, 2011 at 12:15 PM, Lluís <xscript at gmx.net> wrote:

> Summarizing: let's forget for a moment that "mask" has a meaning in english:

This is at the core of the problem.  You and I know what's really
going on - there's a mask over the data.   But in what follows we're
going to try and pretend that is not what is going on.  The result is
something that is rather hard to understand, and, when you do
understand it, it's surprising and inconvenient.   This is all because
we tried to conceal what was really going on.

>             - "maskna" corresponds to ABSENT
>             - "ownmaskna" corresponds to IGNORED
>
> The problem here is that of the two implementation mechanisms (masks and
> bitpatterns), only the first can provide both semantics.

But let's be clear.   The current masked array implementation is made
so it looks like ABSENT, and makes IGNORED hard to get to.

> Let's start with an array that already supports NAs:
>
> In [1]: a = np.array([1, 2, 3], maskna = True)
>
>
>
> ABSENT (destructive NA assignment)
> ----------------------------------
>
> Once you assign NA, even if you're using NA masks, the value seems to be lost
> forever (i.e., the assignment is destructive regardless of the value):
>
> In [2]: b = a.view()
> In [3]: c = a.view(maskna = True)
> In [4]: b[0] = np.NA
> In [5]: a
> Out[5]: array([NA, 2, 3])
> In [6]: b
> Out[6]: array([NA, 2, 3])
> In [7]: c
> Out[7]: array([NA, 2, 3])

Right - the mask (fundamentally an IGNORED signal) is pretending to
implement ABSENT.  But - as you point out below - I'm pasting it here
- in fact it's IGNORED.

> In [21]: a = np.array([1, 2, 3])
> Out[21]: array([1, 2, 3])
> In [22]: b = a.view(maskna = True)
> In [23]: b[0] = np.NA
> In [24]: a
> Out[24]: array([1, 2, 3])
> In [25]: b
> Out[25]: array([NA, 2, 3])

But now - I've done this:

>>> a = np.array([99, 100, 3], maskna=True)
>>> a[0:2] = np.NA

You and I know that I've got an array with values [99, 100, 3] and a
mask with values [False, False, True].  So maybe I'd like to see what
happens if I take off the mask from the second value.   I know that's
what I want to do, but I don't know how to do it, because you won't
let me manipulate the mask, because I'm not allowed to know that the
NA values come from the mask.

The alterNEP is just saying - please - be straight with me.   If
you're doing masking, show me the mask, and don't try and hide that
there are stored values underneath.

Best,

Matthew



More information about the NumPy-Discussion mailing list