[Numpy-discussion] feedback request: proposal to add masks to the core ndarray

Fri Jun 24 14:06:44 EDT 2011

On Fri, Jun 24, 2011 at 11:13 AM, Christopher Barker
<Chris.Barker at noaa.gov>wrote:

> Nathaniel Smith wrote:
> >> The 'dtype factory' idea builds on the way I've structured datetime as a
> >> parameterized type,
>
> ...
>
> Another disadvantage is that we get further from Gael Varoquaux's point:
>  >> Right now, the numpy array can be seen as an extension of the C
> >> array, basically a pointer, a data type, and a shape (and strides).
> >>  This enables easy sharing with libraries that have not been
> >> written with numpy in mind.
>
> and also PEP 3118 support
>
> It is very useful that a numpy array has a pointer to a regular old C
> array -- if we introduce this special dtype, that will break (well, not
> really, put the the c array would be of this particular struct).
> Granted, any other C code would properly have to do something with the
> mask anyway, but I still think it'd be better to keep that raw data
> array standard.
>

It's not actually a pointer to a C array, there is already a lot of checking
and possibly a copy/buffer required before you can treat it as such. The
data may be misaligned, have noncontiguous strides, have a non-C
multidimensional memory layout, or have a different byte order. Dealing with
all these special cases in a uniform way is one of the things the 1.6 nditer
provides a lot of helps for.

>
> This applies to switching between masked and not-masked numpy arrays
> also -- I don't think I'd want the performance hot of that requiring a
> data copy.
>

When performance is important, it is still possible to avoid that copy - by
adding the mask to a view of the original array. The mask= parameter to
ufuncs, something which is independent of arrays with masks, also provides a
way to do masked operations without ever touching masked arrays.

Also the idea was posted here that you could use views to have the same
> data set with different masks -- that would break as well.
>

I'm not sure how this would break? I think that should work just fine.

>
> Nathaniel Smith wrote:
>
> > If we think that the memory overhead for floating point types is too
> > high, it would be easy to add a special case where maybe(float) used a
> > distinguished NaN instead of a separate boolean.
>
> That would  be pretty cool, though in the past folks have made a good
> argument that even for floats, masks have significant advantages over
> "just using NaN". One might be that you can mask and unmask a value for
> different operations, without losing the value.
>

Especially with the ability to do the "hardmask" feature, this aspect of it
might end up being useful.

-Mark

>
> -Chris
>
>
>
>
> --
> Christopher Barker, Ph.D.
> Oceanographer
>
> Emergency Response Division
> NOAA/NOS/OR&R            (206) 526-6959   voice
> 7600 Sand Point Way NE   (206) 526-6329   fax
> Seattle, WA  98115       (206) 526-6317   main reception
>
> Chris.Barker at noaa.gov
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20110624/6d5edf5b/attachment.html>