[Numpy-discussion] feedback request: proposal to add masks to the core ndarray

Fri Jun 24 14:18:07 EDT 2011

Hi,

On Fri, Jun 24, 2011 at 5:45 PM, Mark Wiebe <mwwiebe at gmail.com> wrote:
> On Fri, Jun 24, 2011 at 6:59 AM, Matthew Brett <matthew.brett at gmail.com>
> wrote:
>>
>> Hi,
>>
>> On Fri, Jun 24, 2011 at 2:32 AM, Nathaniel Smith <njs at pobox.com> wrote:
...
>> and the fact that 'missing_value' could be any type would make the
>> code more complicated than the current case where the mask is always
>> bools or something?
>
> I'm referring to the underlying C implementations of the dtypes and any
> additional custom dtypes that people create. With the masked approach, you
> implement a new custom data type in C, and it automatically works with
> missing data. With the custom dtype approach, you have to do a lot more
> error-prone work to handle the special values in all the ufuncs.

This is just pure ignorance on my part, but I can see that the ufuncs
need to handle the missing values, but I can't see immediately why
that will be much more complicated than the 'every array might have a
mask' implementation.  This was what I was trying to say with my silly
sketch:

missing_value = np.dtype.missing_value

for e in oned_array:
     if e == missing_value:

well - you get the idea.  Obviously this is what you've been thinking
about, I just wanted to get a grasp of where the extra complexity is
coming from compared to:

for i, e in enumerate(one_d_array):
     if one_d_array.mask[i] == False:

Cheers,

Matthew