[Numpy-discussion] Removing masked arrays for 1.7? (Was 1.7 blockers)

Tim Cera tim at cerazone.net
Tue Apr 17 15:57:04 EDT 2012


I have never found mailing lists good places for discussion and consensus.
 I think the format itself does not lend itself to involvement, carefully
considered (or the ability to change) positions, or voting since all of it
can be so easily lost within all of the quoting, the back and forth, people
walking away,,,etc.  And you also want involvement from people who don't
have x hours to craft a finely worded, politically correct, and detailed
response.  I am not advocating this particular system, but something like
http://meta.programmers.stackexchange.com/ would be a better platform for
building to a decision when there are many choices to be made.

Now about ma, NA, missing...

I am just an engineer working in water resources and I had lots of
difficulty reading the NEP (so sleeeeepy) so I will be the first to admit
that I probably have something wrong.  Just for reference (since I missed
it the first time around) Nathaniel posted this page at
https://github.com/njsmith/numpy/wiki/NA-discussion-status

I think that I could adapt to everything that is discussed in the NEP, but
I do have some comments about things that puzzled me.  I don't need
answers, but if I am puzzled maybe others are also.

First - 'maskna=True'?
Tested on development version of numpy...
    >>> a = np.arange(10, maskna = True)
    >>> a[:2] = np.NA
    >>> a
    array([NA, NA, 2, 3, 4, 5, 6, 7, 8, 9])

Why do I have to specify 'maskna = True'?  If NA and ndarray are intended
to be combined in some way, then I don't think that I need this.  During
development, I understand, but the NEP shouldn't have it.  Heck, even if
you keep NA and ndarrays separate when someone tries to set a ndarray
element with np.NA, instead of a ValueError convert to an NA array.  I say
that very casually as if I know how to do it.  I do have a proof, but the
margin is too small to include it.  :-)

I am torn about 'skipna=True'.  I think I understand the desire for
explicit behavior, but I suspect that every operation that I would use a NA
array for, would require 'skipna=True'.  Actually, I don't use that many
reducing functions, so maybe not a big deal.  Regardless of the skipna
setting, a related idea that could be useful for reducing functions is
to set an 'includesna' attribute with the returned scalar value.

The view() didn't work as described in the NEP, where np.NA isn't
propagated back to the original array.  This could be because the NEP
references a 'missingdata' work in progress branch and I don't know what
has been merged.  I can force the NEP described behavior if I set
'd.flags.ownmaskna=True'.
    >>> d = a.view()
    >>> d
     array([NA, NA, 2, 3, 4, 5, 6, 7, 8, 9])
    >>> d[0] = 4
    >>> a
     array([4, NA, 2, 3, 4, 5, 6, 7, 8, 9])
    >>> d
     array([4, NA, 2, 3, 4, 5, 6, 7, 8, 9])
    >>> d[6] = np.NA
    >>> d
     array([4, NA, 2, 3, 4, 5, NA, 7, 8, 9])
    >>> a
     array([4, NA, 2, 3, 4, 5, NA, 7, 8, 9])

In the NEP 'Accessing a Boolean Mask' section there is a comment about...
actually I don't understand this section at all.  Especially about a
boolean byle level mask?  Why would it need to be a byte level mask in
order to be viewed?  The logic also of mask = True or False, that can be
easily handled by using a better name for the flag.  'mask = True' means
that the value is masked (missing), where if 'exposed = True' is used that
means the value is not masked (not missing).

The biggest question mark to me is that 'a[0] = np.NA' is destructive and
(using numpy.ma) 'a.mask[0] = True' is not.  Is that a big deal?  I am
trying to think back on all of my 'ma' code and try to remember if I
masked, then unmasked values and I don't recall any time that I did that.
 Of course my use cases are constrained to what I have done in the past.
 It feels like a bad idea, for the sake of saving the memory for the mask
bits.

Now, the amazing thing is that understanding so little, doing even less of
the work, I get to vote. Sounds like America!

I would really like to see NA in the wild, and I think that I can adapt my
ma code to it, so +1.  If it has to wait until 1.8, +1.  If it has to wait
until 1.9, +1.

Kindest regards,
Tim
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120417/75e20926/attachment.html>


More information about the NumPy-Discussion mailing list