[Numpy-discussion] Missing data again

Eric Firing efiring at hawaii.edu
Wed Mar 7 14:57:35 EST 2012


On 03/07/2012 09:26 AM, Nathaniel Smith wrote:
> On Wed, Mar 7, 2012 at 5:17 PM, Charles R Harris
> <charlesr.harris at gmail.com>  wrote:
>> On Wed, Mar 7, 2012 at 9:35 AM, Pierre Haessig<pierre.haessig at crans.org>
>>> Coming back to Travis proposition "bit-pattern approaches to missing
>>> data (*at least* for float64 and int32) need to be implemented.", I
>>> wonder what is the amount of extra work to go from nafloat64 to
>>> nafloat32/16 ? Is there an hardware support NaN payloads with these
>>> smaller floats ? If not, or if it is too complicated, I feel it is
>>> acceptable to say "it's too complicated" and fall back to mask. One may
>>> have to choose between fancy types and fancy NAs...
>>
>> I'm in agreement here, and that was a major consideration in making a
>> 'masked' implementation first.
>
> When it comes to "missing data", bitpatterns can do everything that
> masks can do, are no more complicated to implement, and have better
> performance characteristics.
>
>> Also, different folks adopt different values
>> for 'missing' data, and distributing one or several masks along with the
>> data is another common practice.
>
> True, but not really relevant to the current debate, because you have
> to handle such issues as part of your general data import workflow
> anyway, and none of these is any more complicated no matter which
> implementations are available.
>
>> One inconvenience I have run into with the current API is that is should be
>> easier to clear the mask from an "ignored" value without taking a new view
>> or assigning known data. So maybe two types of masks (different payloads),
>> or an additional flag could be helpful. The process of assigning masks could
>> also be made a bit easier than using fancy indexing.
>
> So this, uh... this was actually the whole goal of the "alterNEP"
> design for masks -- making all this stuff easy for people (like you,
> apparently?) that want support for ignored values, separately from
> missing data, and want a nice clean API for it. Basically having a
> separate .mask attribute which was an ordinary, assignable array
> broadcastable to the attached array's shape. Nobody seemed interested
> in talking about it much then but maybe there's interest now?

In other words, good low-level support for numpy.ma functionality?  With 
a migration path so that a separate numpy.ma might wither away?  Yes, 
there is interest; this is exactly what I think is needed for my own 
style of applications (which I think are common at least in geoscience), 
and for matplotlib.  The question is how to achieve it as simply and 
cleanly as possible while also satisfying the needs of the R users, and 
while making it easy for matplotlib, for example, to handle *any* 
reasonable input: ma, other masking, nan, or NA-bitpattern.

It may be that a rather pragmatic approach to implementation will prove 
better than a highly idealized set of data models.  Or, it may be that a 
dual approach is best, in which the flag value missing data 
implementation is tightly bound to the R model and the mask 
implementation is explicitly designed for the numpy.ma model. In any 
case, a reasonable level of agreement on the goals is needed.  I presume 
Travis's involvement will facilitate a clarification of the goals and of 
the implementation; and I expect that much of Mark's work will end up 
serving well, even if much needs to be added and the API evolves 
considerably.

Eric

>
> -- Nathaniel
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion




More information about the NumPy-Discussion mailing list