[Numpy-discussion] A crazy masked-array thought

Fri Apr 27 11:54:44 EDT 2012

On Fri, Apr 27, 2012 at 9:16 AM, <josef.pktd at gmail.com> wrote:

> On Fri, Apr 27, 2012 at 10:33 AM, Charles R Harris
> <charlesr.harris at gmail.com> wrote:
> >
> >
> > On Fri, Apr 27, 2012 at 8:15 AM, Charles R Harris
> > <charlesr.harris at gmail.com> wrote:
> >>
> >>
> >>
> >> On Wed, Apr 25, 2012 at 9:58 AM, Richard Hattersley
> >> <rhattersley at gmail.com> wrote:
> >>>
> >>> The masked array discussions have brought up all sorts of interesting
> >>> topics - too many to usefully list here - but there's one aspect I
> haven't
> >>> spotted yet. Perhaps that's because it's flat out wrong, or crazy, or
> just
> >>> too awkward to be helpful. But ...
> >>>
> >>> Shouldn't masked arrays (MA) be a superclass of the plain-old-array
> >>> (POA)?
> >>>
> >>> In the library I'm working on, the introduction of MAs (via numpy.ma)
> >>> required us to sweep through the library and make a fair few changes.
> That's
> >>> not the sort of thing one would normally expect from the introduction
> of a
> >>> subclass.
> >>>
> >>> Putting aside the ABI issue, would it help downstream API compatibility
> >>> if the POA was a subclass of the MA? Code that's expecting/casting-to
> a POA
> >>> might continue to work and, where appropriate, could be upgraded in
> their
> >>> own time to accept MAs.
> >>>
> >>
> >> That's a version of the idea that all arrays have masks, just some of
> them
> >> have "missing" masks. That construction was mentioned in the thread but
> I
> >> can see how one might have missed it. I think it is the right way to do
> >> things. However, current libraries and such will still need to do some
> work
> >> in order to not do the wrong thing when a "real" mask was present. For
> >> instance, check and raise an error if they can't deal with it.
> >
> >
> > To expand a bit more, this is precisely why the current work on making
> masks
> > part of ndarray rather than a subclass was undertaken. There is a flag
> that
> > says whether or not the array is masked, but you will still need to check
> > that flag to see if you are working with an unmasked instance of
> ndarray. At
> > the moment the masked version isn't quite completely fused with
> > ndarrays-classic since the maskedness needs to be specified in the
> > constructors and such, but what you suggest is actually what we are
> working
> > towards.
> >
> > No matter what is done, current functions and libraries that want to use
> > masks are going to have to deal with the existence of both masked and
> > unmasked arrays since the existence of a mask can't be ignored without
> > risking wrong results.
>
> (In case it's not the wrong thread)
>
> If every ndarray has this maskflag, then it is easy to adjust other
> library code.
>
>
That is the case.

In [1]: ones(1).flags
Out[1]:
  C_CONTIGUOUS : True
  F_CONTIGUOUS : True
  OWNDATA : True
  MASKNA : False
  OWNMASKNA : False
  WRITEABLE : True
  ALIGNED : True
  UPDATEIFCOPY : False

What I'd like to add is that the mask is only allocated when NA (or
equivalent) is assigned. That way the flag also signals the actual presence
of a masked value.

> if myarr.maskflag is not None: raise SorryException
>
> What is expensive is having to do np.isnan(myarr) or
> np.isfinite(myarr) everywhere.
> https://github.com/scipy/scipy/pull/48
>
> As a concept I like the idea, masked arrays are the general class with
> generic defaults, "clean" arrays are a subclass where some methods are
> overwritten with faster implementations.
>
>
Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120427/6045e6e5/attachment.html>