[Numpy-discussion] Thoughts on masked arrays

Michael Haggerty mhagger at alum.mit.edu
Thu May 10 18:11:37 EDT 2001


"Paul F. Dubois" <paul at pfdubois.com> writes:

> -----Original Message-----
> Michael Haggerty wrote
> 1. I couldn't find a simple way to tell if all of the cells of a
>    masked array are unmasked.
> ======
> So your test could be if count(x) < product(x.shape): error...
> 
> So your test could be
>    if make_mask(m.mask(),flag=1) is not None:
>        error...
> 
> You could also consider if not Numeric.allclose(m.filled(0), m.filled(1))
> or
> m.mask() is not None and not Numeric.alltrue(Numeric.ravel(m.mask())):

Shouldn't that be

    m.mask() is not None and Numeric.sometrue(Numeric.ravel(m.mask()))

?  ("Proof" that these expressions are nonintuitive.)

> Is that enough ways to do it? (TM) (:->

Frankly, it's too many ways to do it, none of them obvious to the
writer or the reader.  This is a simple and useful concept and it
should have one obvious implementation.

> I'm not against is_unmasked but I'm not sure how much it would get
> used and I don't like the name.  I hate query methods with side
> effects (if you use them in an assert you change the program).

In this case the side effect is to change the internal representation
of the object without changing its semantics, so I don't find it too
objectionable.  But omit this optimization if you prefer; the query
method would be just as useful even without the side effect.

Because of the relationship with filled(), maybe this query function
should be called m.isfull().  There should probably also be an
isfull(m) function for the same reason that there is a mask(m)
function.

> A method that replaces the mask with None if possible might make
> sense.  m.unmask()? m.demask()? m.debride() ?

Of these names, I like m.unmask() the best.  I assume that it would
set m.__mask=None if possible and throw an exception if not.

On the other hand, while it would be desirable to have a function
equivalent (i.e., unmask(m)), this would be awkward because a function
should usually not change its argument.

Therefore, I suggest adding a safe analogue of raw_data() that throws
an exception if the array has a nontrivial mask and otherwise returns
self.__data.  E.g. [untested]:

class MaskedArray:
    [...]
    def data(self):
        """If no values are masked, return self.__data().  Otherwise
           raise an exception.
        """
        d = self.__data
        m = self.__mask
        if m is not None and Numeric.sometrue(Numeric.ravel(m)):
	    raise MAError, "MaskedArray cannot be converted to array"
	elif d.iscontiguous():
	    return d
        else:
	    return Numeric.array(d, typecode=d.typecode(), copy=1,
                                 savespace = d.spacesaver())


def data(a):
    if isinstance(a, MaskedArray):
        return m.data()
    elif isinstance(a, Numeric.ArrayType) and a.iscontiguous():
        return a
    else:
        return Numeric.array(a)

A more obscure name should be chosen since you seem to encourage "from
MA import *".

> 3. I found the semantics of MA.compress(condition,a,axis=0) to be
>    inconvenient and inconsistent with those of Numeric.compress.
> ======
> It has been an interesting project in that there are hundreds of these
> individual little design questions.
> Can you propose the semantics you would like in a precise way? Include the
> case where the condition has masked values.
> ======

In the simple case where the condition has no masked values, I think
compress() should simply pick slices out according to condition,
without regard to which cells of x are masked.  When condition is
masked, I don't think that there is a sensible interpretation for
compress() because a "masked" value in condition means you don't know
whether that slice of x should be included or not.  Since you can't
have an output array of indeterminate shape, I would throw an
exception if condition is masked.  Here is my attempt [untested]:

def compress(condition, x, dimension=-1):
    # data function is defined above (throws exception if condition is masked):
    c = data(condition)
    if mask(x) is None:
        mask = None
    else:
        mask=Numeric.compress(condition, mask(x), dimension)
    return array(Numeric.compress(condition, filled(x), dimension), mask=mask)


Yours,
Michael

-- 
Michael Haggerty
mhagger at alum.mit.edu




More information about the NumPy-Discussion mailing list