[Numpy-discussion] A crazy masked-array thought

Fri Apr 27 11:28:13 EDT 2012

Hi all,

Thanks for all your responses and for your patience with a newcomer. Don't
worry - I'm not going to give up yet. It's all just part of my learning the
ropes.

On 27 April 2012 14:05, Benjamin Root <ben.root at ou.edu> wrote:

> <snip>Your idea is interesting, but doesn't it require C++?  Or maybe you
> are thinking of creating a new C type object that would contain all the new
> features and hold a pointer and function interface to the original POA.
> Essentially, the new type would act as a wrapper around the original
> ndarray?</snip>
>
When talking about subclasses I'm just talking about the end-user
experience within Python. In other words, I'm starting from issubclass(POA,
MA) == True, and trying to figure out what the Python API implications
would be.

On 27 April 2012 14:55, Nathaniel Smith <njs at pobox.com> wrote:

> On Fri, Apr 27, 2012 at 11:32 AM, Richard Hattersley
> <rhattersley at gmail.com> wrote:
> > I know used a somewhat jokey tone in my original posting, but
> fundamentally
> > it was a serious question concerning a live topic. So I'm curious about
> the
> > lack of response. Has this all been covered before?
> >
> > Sorry if I'm being too impatient!
>
> That's fine, I know I did read it, but I wasn't sure what to make of
> it to respond :-)
>
> > On 25 April 2012 16:58, Richard Hattersley <rhattersley at gmail.com>
> wrote:
> >>
> >> The masked array discussions have brought up all sorts of interesting
> >> topics - too many to usefully list here - but there's one aspect I
> haven't
> >> spotted yet. Perhaps that's because it's flat out wrong, or crazy, or
> just
> >> too awkward to be helpful. But ...
> >>
> >> Shouldn't masked arrays (MA) be a superclass of the plain-old-array
> (POA)?
> >>
> >> In the library I'm working on, the introduction of MAs (via numpy.ma)
> >> required us to sweep through the library and make a fair few changes.
> That's
> >> not the sort of thing one would normally expect from the introduction
> of a
> >> subclass.
> >>
> >> Putting aside the ABI issue, would it help downstream API compatibility
> if
> >> the POA was a subclass of the MA? Code that's expecting/casting-to a POA
> >> might continue to work and, where appropriate, could be upgraded in
> their
> >> own time to accept MAs.
>
> This makes a certain amount of sense from a traditional OO modeling
> perspective, where classes are supposed to refer to sets of objects
> and subclasses are subsets and superclasses are supersets. This is the
> property that's needed to guarantee that if A is a subclass of B, then
> any code that expects a B can also handle an A, since all A's are B's,
> which is what you need if you're doing type-checking or type-based
> dispatch. And indeed, from this perspective, MAs are a superclass of
> POAs, because for every POA there's a equivalent MA (the one with the
> mask set to all-true), but not vice-versa.
>
> But, that model of OO doesn't have much connection to Python. In
> Python's semantics, classes are almost irrelevant; they're mostly just
> some convenience tools for putting together the objects you want, and
> what really matters is the behavior of each object (the famous "duck
> typing"). You can call isinstance() if you want, but it's just an
> ordinary function that looks at some attributes on an object; the only
> magic involved is that some of those attributes have underscores in
> their name. In Python, subclassing mostly does two things: (1) it's a
> quick way to define set up a class that's similar to another class
> (though this is a worse idea than it looks -- you're basically doing
> 'from other_class import *' with all the usual tight-coupling problems
> that 'import *' brings). (2) When writing Python objects at the C
> level, subclassing lets you achieve memory layout compatibility (which
> is important because C does *not* do duck typing), and it lets you add
> new fields to a C struct.
>
> So at this level, MAs are a subclass of POAs, because MAs have an
> extra field that POAs don't...
>
> So I don't know what to think about subclasses/superclasses here,
> because they're such confusing and contradictory concepts that it's
> hard to tell what the actual resulting API semantics would be.
>

It doesn't seem essential that MAs have an extra field that POAs don't. If
POA was a subclass of MA, instances of POA could have the extra field set
to an "all-valid"/"nothing-is-masked" value. Granted, you'd want that to be
a special value so you're not lugging around a load of redundant data (and
you can optimise your processing for that), but I'm guessing you'd probably
want that kind of capability within MA anyway.

On 27 April 2012 15:33, Charles R Harris <charlesr.harris at gmail.com> wrote:

>
>
> On Fri, Apr 27, 2012 at 8:15 AM, Charles R Harris <
> charlesr.harris at gmail.com> wrote:
>
>>
>>
>> On Wed, Apr 25, 2012 at 9:58 AM, Richard Hattersley <
>> rhattersley at gmail.com> wrote:
>>
>>> The masked array discussions have brought up all sorts of interesting
>>> topics - too many to usefully list here - but there's one aspect I haven't
>>> spotted yet. Perhaps that's because it's flat out wrong, or crazy, or just
>>> too awkward to be helpful. But ...
>>>
>>> Shouldn't masked arrays (MA) be a superclass of the plain-old-array
>>> (POA)?
>>>
>>> In the library I'm working on, the introduction of MAs (via numpy.ma)
>>> required us to sweep through the library and make a fair few changes.
>>> That's not the sort of thing one would normally expect from the
>>> introduction of a subclass.
>>>
>>> Putting aside the ABI issue, would it help downstream API compatibility
>>> if the POA was a subclass of the MA? Code that's expecting/casting-to a POA
>>> might continue to work and, where appropriate, could be upgraded in their
>>> own time to accept MAs.
>>>
>>>
>> That's a version of the idea that all arrays have masks, just some of
>> them have "missing" masks. That construction was mentioned in the thread
>> but I can see how one might have missed it. I think it is the right way to
>> do things. However, current libraries and such will still need to do some
>> work in order to not do the wrong thing when a "real" mask was present. For
>> instance, check and raise an error if they can't deal with it.
>>
>
> To expand a bit more, this is precisely why the current work on making
> masks part of ndarray rather than a subclass was undertaken. There is a
> flag that says whether or not the array is masked, but you will still need
> to check that flag to see if you are working with an unmasked instance of
> ndarray. At the moment the masked version isn't quite completely fused with
> ndarrays-classic since the maskedness needs to be specified in the
> constructors and such, but what you suggest is actually what we are working
> towards.
>
> No matter what is done, current functions and libraries that want to use
> masks are going to have to deal with the existence of both masked and
> unmasked arrays since the existence of a mask can't be ignored without
> risking wrong results.
>
> Chuck
>

Having the class hierarchy would allow isinstance() to help. And there are
some substantial API implications for this but... if numpy.mean(...) etc
refused to work with MAs then that might also help. (Clearly myarray.mean()
would still work if myarray was actually a MA, but then it would also give
a correct answer.)

What other kinds of checks (implicit or explicit) are already out there?

I'm *very* aware that there are other aspects of the API where the desired
behaviour is even less clear!

Thanks for indulging.
Richard Hattersley
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120427/bf506488/attachment.html>