[Numpy-discussion] feedback request: proposal to add masks to the core ndarray

Wes McKinney wesmckinn at gmail.com
Fri Jun 24 19:22:11 EDT 2011


On Fri, Jun 24, 2011 at 7:10 PM, Charles R Harris
<charlesr.harris at gmail.com> wrote:
>
>
> On Fri, Jun 24, 2011 at 4:21 PM, Matthew Brett <matthew.brett at gmail.com>
> wrote:
>>
>> Hi,
>>
>> On Fri, Jun 24, 2011 at 10:09 PM, Benjamin Root <ben.root at ou.edu> wrote:
>> ...
>> > Again, there are pros and cons either way and I see them very orthogonal
>> > and
>> > complementary.
>>
>> That may be true, but I imagine only one of them will be implemented.
>>
>> @Mark - I don't have a clear idea whether you consider the nafloat64
>> option to be still in play as the first thing to be implemented
>> (before array.mask).   If it is, what kind of thing would persuade you
>> either way?
>>
>
> Mark can speak for himself,  but I think things are tending towards masks.
> They have the advantage of one implementation for all data types, current
> and future, and they are more flexible since the masked data can be actual
> valid data that you just choose to ignore for experimental  reasons.
>
> What might be helpful is a routine to import/export R files, but that
> shouldn't be to difficult to implement.
>
> Chuck
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>

Perhaps we should make a wiki page someplace summarizing pros and cons
of the various implementation approaches? I worry very seriously about
adding API functions relating to masks rather than having special NA
values which propagate in algorithms. The question is: will Joe Blow
Former R user have to understand what is the mask and how to work with
it? If the answer is yes we have a problem. If it can be completely
hidden as an implementation detail, that's great. In R NAs are just
sort of inherent-- they propagate you deal with them when you have to
via na.rm flag in functions or is.na.

The other problem I can think of with masks is the extra memory
footprint, though maybe this is no cause for concern.

-W



More information about the NumPy-Discussion mailing list