[Numpy-discussion] Missing data again

Nathaniel Smith njs at pobox.com
Wed Mar 7 14:26:53 EST 2012


On Wed, Mar 7, 2012 at 5:17 PM, Charles R Harris
<charlesr.harris at gmail.com> wrote:
> On Wed, Mar 7, 2012 at 9:35 AM, Pierre Haessig <pierre.haessig at crans.org>
>> Coming back to Travis proposition "bit-pattern approaches to missing
>> data (*at least* for float64 and int32) need to be implemented.", I
>> wonder what is the amount of extra work to go from nafloat64 to
>> nafloat32/16 ? Is there an hardware support NaN payloads with these
>> smaller floats ? If not, or if it is too complicated, I feel it is
>> acceptable to say "it's too complicated" and fall back to mask. One may
>> have to choose between fancy types and fancy NAs...
>
> I'm in agreement here, and that was a major consideration in making a
> 'masked' implementation first.

When it comes to "missing data", bitpatterns can do everything that
masks can do, are no more complicated to implement, and have better
performance characteristics.

> Also, different folks adopt different values
> for 'missing' data, and distributing one or several masks along with the
> data is another common practice.

True, but not really relevant to the current debate, because you have
to handle such issues as part of your general data import workflow
anyway, and none of these is any more complicated no matter which
implementations are available.

> One inconvenience I have run into with the current API is that is should be
> easier to clear the mask from an "ignored" value without taking a new view
> or assigning known data. So maybe two types of masks (different payloads),
> or an additional flag could be helpful. The process of assigning masks could
> also be made a bit easier than using fancy indexing.

So this, uh... this was actually the whole goal of the "alterNEP"
design for masks -- making all this stuff easy for people (like you,
apparently?) that want support for ignored values, separately from
missing data, and want a nice clean API for it. Basically having a
separate .mask attribute which was an ordinary, assignable array
broadcastable to the attached array's shape. Nobody seemed interested
in talking about it much then but maybe there's interest now?

-- Nathaniel



More information about the NumPy-Discussion mailing list