[Numpy-discussion] Thoughts on masked arrays

Wed May 9 19:20:37 EDT 2001

Hi,

I've spent several days using the masked arrays that have been added
to NumPy recently.  They're a great feature and they were just what I
needed for the little project I was working on (aside from a few bugs
that I found).

However, there were a few things about MA that I found inconvenient
and/or counterintuitive, so I thought I'd post them to the list while
they're fresh in my mind.  I'm using Numeric-20.0.0b2.

1. I couldn't find a simple way to tell if all of the cells of a
   masked array are unmasked.  There are times when you fill an array
   incrementally and you want to convert it to a Numeric array but
   first make sure that all of the elements have been set.
   "m.filled()" is a bit dangerous (in my opinion) because it silently
   fills.  The shortest idiom I could think of is

    >>> assert not logical_or.reduce(ravel(MA.getmaskarray(m)))

   which isn't very short :-) and is also awkward because it creates a
   mask array even if m.mask() is None.  How about a m.is_unmasked()
   method, or even giving a special meaning to "m.filled(masked)",
   namely that it raises an exception if any cells are still masked.
   (As an optimization, this method could set m.__mask = None to speed
   up future checks.)

2. I can't reproduce this problem now, but I could swear that the
   MaskedArray.__str__() method sometimes printed "typecode='O'" if
   masked.enabled() is true.  This would be a byproduct of using
   Numeric's __str__() method to print the array, at least under the
   unknown circumstances in which Numeric.__str__() prints the
   typecode.  This confused me for a while.

3. I found the semantics of MA.compress(condition,a,axis=0) to be
   inconvenient and inconsistent with those of Numeric.compress.
   MA.compress() squeezes out not only those elements for which
   condition is false, but also those elements that are masked.  This
   differs from the behavior of Numeric.compress, which always returns
   an array with the "axis" dimension equal to the number of nonzero
   elements of "condition".  The real problem, though, is that
   MA.compress can't be used on a multidimensional array with a
   nontrivial mask, because squeezing out the masked values is highly
   unlikely to result in a rectangular matrix.  It is nice to be able
   to squeeze masked values out of a 1-d array, but not at the price
   of not being able to use compress on a multidimensional array.  I
   suggest giving MA.compress() semantics closer to those of
   Numeric.compress(), and adding an optional argument or a separate
   method to cause masked elements to be omitted.

Thanks for a great package!

Yours,
Michael   

--
Michael Haggerty
mhagger at alum.mit.edu