[Numpy-discussion] feedback request: proposal to add masks to the core ndarray

Sat Jun 25 15:56:44 EDT 2011

On Sat, Jun 25, 2011 at 6:00 AM, Matthew Brett <matthew.brett at gmail.com>wrote:

> Hi,
>
> On Sat, Jun 25, 2011 at 1:54 AM, Mark Wiebe <mwwiebe at gmail.com> wrote:
> > On Fri, Jun 24, 2011 at 5:21 PM, Matthew Brett <matthew.brett at gmail.com>
> ...
> >> @Mark - I don't have a clear idea whether you consider the nafloat64
> >> option to be still in play as the first thing to be implemented
> >> (before array.mask).   If it is, what kind of thing would persuade you
> >> either way?
> >
> > I'm focusing all of my effort on getting my proposal of adding a mask to
> the
> > core ndarray into a state where it satisfies everyone's requirements as
> best
> > I can.
>
> Maybe it would be worth setting out the requirements formally somewhere?
>

The design that's forming is a combination of:

* Solve the missing data problem
* My ideas of what a good solution looks like:
   * applies to all NumPy dtypes in a fully general way
   * high-performance, low overhead where possible
   * makes the C-level implementation of NumPy nicer to work with, not
harder
   * easy to use from Python for unskilled programmers
   * easy to use more powerful functionality from Python for skilled
programmers
   * satisfies all or most of the needs of the many users of arrays with a
"missing data" aspect to them
* All the feedback I'm getting from discussions on the list

That's not a formal requirements specification, but might shed some insight.

> I'm not precluding the possibility that someone could convince me
> > that the na-dtype is good, but I gave it a good chunk of thought before
> > starting to write the proposal. To persuade me towards the na-dtype
> option,
> > I need to be convinced that I'm solving the problem class in a generic
> way
> > that works orthogonally with other features, with manageable
> implementation
> > requirements, a very usable result for both strong and weak programmers,
> and
> > with good performance characteristics. I think the na-dtype approach
> isn't
> > as generic as I would like, and the implementation seems like it would be
> > trickier than the masked approach.
>
> What I'm getting at, is that I think you have made the decision
> between these two implementations some time ago while looking at the C
> code.  Now of course you would be a much better person to make that
> decision than - say - me.  It's just that, if you want coherent
> feedback from us on this decision, we need to get some technical grasp
> of why you made it.    I realize that it will not be easy to explain
> in detail, but honestly, it could be a useful discussion to have from
> your and our point of view, even if it ends up in the same place.
>

I've updated a section "Parameterized Data Type With NA Signal Values" in
the NEP with an idea for now an NA bit pattern approach could coexist and
work together with the mask-based approach. I think I've solved some of the
generality and implementation obstacles, it would be great to get some
feedback on that.

Cheers,
Mark

>
> See you,
>
> Matthew
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20110625/6ca54521/attachment.html>