[Numpy-discussion] missing data discussion round 2

Mark Wiebe mwwiebe at gmail.com
Tue Jun 28 19:30:17 EDT 2011


On Tue, Jun 28, 2011 at 2:41 PM, Eric Firing <efiring at hawaii.edu> wrote:

> On 06/28/2011 07:26 AM, Nathaniel Smith wrote:
> > On Tue, Jun 28, 2011 at 9:38 AM, Charles R Harris
> > <charlesr.harris at gmail.com>  wrote:
> >> Nathaniel, an implementation using masks will look *exactly* like an
> >> implementation using na-dtypes from the user's point of view. Except
> that
> >> taking a masked view of an unmasked array allows ignoring values without
> >> destroying or copying the original data.
> >
> > Charles, I know that :-).
> >
> > But if that view thing is an advertised feature -- in fact, the key
> > selling point for the masking-based implementation, included
> > specifically to make a significant contingent of users happy -- then
> > it's certainly user-visible. And it will make other users unhappy,
> > like I said. That's life.
> >
> > But who cares? My main point is that implementing a missing data
> > solution and a separate masked array solution is probably less work
> > than implementing a single everything-to-everybody solution *anyway*,
> > *and* it might make both sets of users happier too. Notice that in my
> > proposal, there's really nothing there that isn't already in Mark's
> > NEP in some form or another, but in my version there's almost no
> > overlap between the two features. That's not because I was trying to
> > make them artificially different; it's because I tried to think of the
> > most natural ways to satisfy each set of use cases, and they're just
> > different.
>
> I think you are exaggerating some of the differences associated with the
> implementation, and ignoring one *key* difference: for integer types,
> the masked implementation can handle the full numeric range of the type,
> while the bit-pattern approach cannot.
>
> Balanced against that, the *key* advantages of the bit-pattern approach
> would seem to be the simplicity of using a single array, particularly
> for IO (including memmapping) and interfacing with extension code.
> Although I am a heavy user of masked arrays, I consider these
> bit-pattern advantages to be substantial and deserving of careful
> consideration--perhaps of more weight and planning than they have gotten
> so far.
>
> Datasets on disk--e.g. climatological data, numerical model output,
> etc.--typically do use reserved values as missing value flags, although
> occasionally one also finds separate mask arrays.
>
> One of the real frustrations of the present masked array is that there
> is no savez/load support.  I could roll my own by using a convention
> like saving the mask of xxx as xxx__mask__, and then reversing the
> process in a modified load; but I haven't gotten around to doing it.
> Regardless of internal implementation, I hope that core support for
> missing values will be included in savez/load.
>

This sounds reasonable to me, and probably will require extending the file
format a bit.

-Mark


>
> Eric
>
>
>
> >
> > -- Nathaniel
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion at scipy.org
> > http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20110628/985a0120/attachment.html>


More information about the NumPy-Discussion mailing list