[Numpy-discussion] feedback request: proposal to add masks to the core ndarray

Mark Wiebe mwwiebe at gmail.com
Thu Jun 30 10:50:53 EDT 2011


On Wed, Jun 29, 2011 at 1:51 PM, Lluís <xscript at gmx.net> wrote:

> Mark Wiebe writes:
> [...]
> >     I think that deciding on the value of NA signal values boils down to
> >     this question: should 3rd party code be able to interpret missing
> data
> >     information stored in the separate mask array?
>
> > I'm tossing around some variations of ideas using the iterator to
> > provide a buffered mask-based interface that works uniformly with both
> > masked arrays and NA dtypes. This way 3rd party C code only needs to
> > implement one missing data mechanism to fully support both of NumPy's
> > missing data mechanisms.
>
> Nice. If non-numpy C code is bound to see it as an array (i.e., _always_
> oblivious to the mask concept), then you should probably do what I said
> about "(un)merging" the bit pattern and mask-based NAs, but in this case
> can be done on each block given by the iteration window.
>

My hands are a little bit tied because of ABI compatibility, but I'm
thinking of ways I can cause 3rd party C code to fail if it doesn't ask for
the data with the mask when it's masked.

There's still the possibility of giving a finer granularity interface
> where both are explicitly accessed, but this will probably add yet
> another set of API functions (although the merging interface can be
> implemented on top of this explicit raw iteration interface).
>

Things should be as simple as possible, but having layers of lower level
stuff and higher level stuff is good. This is why, for instance, I
introduced the where= parameter to ufuncs, because it's another useful way
of using the same low-level mechanisms.

BTW, this has some overlapping with a mail Travis sent long ago about
> dynamically filling the backing byffer contents (in this case with the
> "merged" NA data for 3rd parties).
>
> It might prove completely unsatisfactory (w.r.t. performance), but you
> could also fake a bit-pattern-only sequential array by using mprotect to
> detect the memory accesses and trigger then the production of the merged
> data. This provides means for code using the simple buffer protocol,
> without duplicating the whole structure for NA merges.
>
> This can be complicated even more with some simple strided pattern
> detection to diminish the number of segfaults, as the shape is known.
>

Someone else will have to do stuff like this... ;)

-Mark


>
>
> Lluis
>
> --
>  "And it's much the same thing with knowledge, for whenever you learn
>  something new, the whole world becomes that much richer."
>  -- The Princess of Pure Reason, as told by Norton Juster in The Phantom
>  Tollbooth
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20110630/1fad62a4/attachment.html>


More information about the NumPy-Discussion mailing list