[Numpy-discussion] feedback request: proposal to add masks to the core ndarray

Thu Jun 23 20:42:11 EDT 2011

On Thu, Jun 23, 2011 at 7:28 PM, Pierre GM <pgmdevlist at gmail.com> wrote:

> Sorry y'all, I'm just commenting bits by bits:
>
> "One key problem is a lack of orthogonality with other features, for
> instance creating a masked array with physical quantities can't be done
> because both are separate subclasses of ndarray. The only reasonable way to
> deal with this is to move the mask into the core ndarray."
>
> Meh. I did try to make it easy to use masked arrays on top of subclasses.
> There's even some tests in the suite to that effect (test_subclassing). I'm
> not buying the argument.
> About moving mask in the core ndarray: I had suggested back in the days to
> have a mask flag/property built-in ndarrays (which would *really* have
> simplified the game), but this suggestion  was dismissed very quickly as
> adding too much overload. I had to agree. I'm just a tad surprised the wind
> has changed on that matter.

Ok, I'll have to change that section then. :)

I don't remember seeing mention of this ability in the documentation, but I
may not have been reading closely enough for that part.

> "In the current masked array, calculations are done for the whole array,
> then masks are patched up afterwords. This means that invalid calculations
> sitting in masked elements can raise warnings or exceptions even though they
> shouldn't, so the ufunc error handling mechanism can't be relied on."
>
> Well, there's a reason for that. Initially, I tried to guess what the mask
> of the output should be from the mask of the inputs, the objective being to
> avoid getting NaNs in the C array. That was easy in most cases,  but it
> turned out it wasn't always possible (the `power` one caused me a lot of
> issues, if I recall correctly). So, for performance issues (to avoid a lot
> of expensive tests), I fell back on the old concept of "compute them all,
> they'll be sorted afterwards".
> Of course, that's rather clumsy an approach. But it works not too badly
> when in pure Python. No doubt that a proper C implementation would work
> faster.
> Oh, about using NaNs for invalid data ? Well, can't work with integers.
>

In my proposal, NaNs stay as unmasked NaN values, instead of turning into
masked values. This is necessary for uniform treatment of all dtypes, but a
subclass could override this behavior with an extra mask modification after
arithmetic operations.

> `mask` property:
> Nothing to add to it. It's basically what we have now (except for the
> opposite convention).
>
> Working with masked values:
> I recall some strong points back in the days for not using None to
> represent missing values...
> Adding a maskedstr argument to array2string ? Mmh... I prefer a global flag
> like we have now.
>

I'm not really a fan of all the global state that NumPy keeps, I guess I'm
trying to stamp that out bit by bit as well where I can...

Design questions:
> Adding `masked` or whatever we call it to a number/array should result is
> masked/a fully masked array, period. That way, we can have an idea that
> something was wrong with the initial dataset.
>

I'm not sure I understand what you mean, in the design adding a mask means
setting "a.mask = True", "a.mask = False", or "a.mask = <boolean array>" in
general.

> hardmask: I never used the feature myself. I wonder if anyone did. Still,
> it's a nice idea...
>

Ok, I'll leave that out of the initial design unless someone comes up with
some strong use cases.

-Mark

> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20110623/f6bdd27e/attachment.html>