[Numpy-discussion] An NA compromise idea -- many-NA

Fri Jul 1 19:17:29 EDT 2011

On Fri, Jul 1, 2011 at 2:04 PM, Pierre GM <pgmdevlist at gmail.com> wrote:
> Mask an array with NAs? You should be able to, as IGNORE<>NA. Mask an array
> with a view? That's sharing the data with a different mask, you should be
> able to, too (np.ma works like that).

I think you might be getting the proposals mixed up... Charles is
talking about the NEP design, which has no distinction between IGNORE
and NA; there's just NA-because-of-mask and NA-because-of-bit-pattern,
which behave the same way except that under certain special
circumstances you can trick the NA-because-of-mask one into acting
more like the masked arrays you're thinking of. (For instance, you can
"unmask" an NA-because-of-mask by using the following algorithm: save
a view of the original array before you ever add a mask to it. Then
when you want to unmask a value in place, you make a copy of the
current mask, flip the appropriate bit in the copy, and then make a
new masked array by combining a new view of the original array with
your new copy of the mask. Now you have a new array object that shares
memory with the old array and has that value unmasked. IIUC.)

> Sharing mask? That'd be great if we could... That way, there'd be almost
> nothing left to do to adapt np.ma...

I'm not sure if the NEP design supports sharing masks or not -- maybe
you could just assign the same object to two different array's
.validitymask properties, but that property has a lot of magic in it.
I don't know if that would work like a normal 'a = b' assignment, or
would actually be more like 'a[:] = b[:]'. In at least some versions
of the NEP design, it was an explicit goal that it not be possible to
access the mask's memory directly under any circumstances, because
they wanted to keep the API agnostic between using a
one-byte-per-boolean mask, versus a one-bit-per-boolean mask. If
that's still true (the current text doesn't seem to say either way),
then there can't be any API that lets you get any kind of numpy array
view of the mask, and .validitymask might actually be a snapshot
generated from scratch on each access, in which case the obvious
'a.validitymask = b.validitymask' definitely wouldn't work. I guess
you could support sharing by defining an opaque 'mask' object that you
can't peek inside, but can only take from one array and attach to
another?

In the alterNEP design, the .visible field is just an ordinary numpy
array with some extra checking applied (to ensure that its shape
matches, etc.), so sharing masks would just be a matter of assigning
the same object to two different arrays.

-- Nathaniel