[Numpy-discussion] alterNEP - was: missing data discussion round 2

Thu Jun 30 11:38:55 EDT 2011

Hi,

On Thu, Jun 30, 2011 at 2:58 PM, Pierre GM <pgmdevlist at gmail.com> wrote:
>
> On Jun 30, 2011, at 3:31 PM, Matthew Brett wrote:
>> ###############################################
>> A alternative-NEP on masking and missing values
>> ###############################################
>
> I like the idea of two different special values, np.NA for missing values, np.IGNORE for masked values. np.NA values in an array define what was implemented in numpy.ma as a 'hard mask' (where you can't unmask data), while np.IGNOREs correspond to the .mask in numpy.ma. Looks fairly non ambiguous that way.
>
>
>> **************
>> Initialization
>> **************
>>
>> First, missing values can be set and be displayed as ``np.NA, NA``::
>>
>>>>> np.array([1.0, 2.0, np.NA, 7.0], dtype='NA[f8]')
>>    array([1., 2., NA, 7.], dtype='NA[<f8]')
>>
>> As the initialization is not ambiguous, this can be written without the NA
>> dtype::
>>
>>>>> np.array([1.0, 2.0, np.NA, 7.0])
>>    array([1., 2., NA, 7.], dtype='NA[<f8]')
>>
>> Masked values can be set and be displayed as ``np.MASKED, MASKED``::
>>
>>>>> np.array([1.0, 2.0, np.MASKED, 7.0], masked=True)
>>    array([1., 2., MASKED, 7.], masked=True)
>>
>> As the initialization is not ambiguous, this can be written without
>> ``masked=True``::
>>
>>>>> np.array([1.0, 2.0, np.MASKED, 7.0])
>>    array([1., 2., MASKED, 7.], masked=True)
>
> I'm not happy with this 'masked' parameter, at all. What's the point? Either you have np.NAs and/or np.IGNOREs or you don't. I'm probably missing something here.

If I put np.MASKED (I agree I prefer np.IGNORE) in the init, then
obviously I mean it should be masked, so the 'masked=True' here is
completely redundant, yes, I agree.  And in fact:

np.array([1.0, 2.0, np.MASKED, 7.0], masked=False)

should raise an error.  On the other hand, if I make a normal array:

arr = np.array([1.0, 2.0, 7.0])

and then do this:

arr.visible[2] = False

then either I should raise an error (it's not a masked array), or,
more magically, construct a mask on the fly.   This somewhat breaks
expectations though, because you might just have made a largish mask
array without having any clue that that had happened.

>
>> ******
>> Ufuncs
>> ******
>
> All fine.
>>
>> **********
>> Assignment
>> **********
>>
>> is obvious in the NA case::
>>
>>>>> arr = np.array([1.0, 2.0, 7.0])
>>>>> arr[2] = np.NA
>>    TypeError('dtype does not support NA')
>>>>> na_arr = np.array([1.0, 2.0, 7.0], dtype='NA[f8]')
>>>>> na_arr[2] = np.NA
>>>>> na_arr
>>    array([1., 2., NA], dtype='NA[<f8]')
>
> OK
>
>
>>
>> Direct assignnent in the masked case is magic and confusing, and so happens only
>> via the mask::
>>
>>>>> masked_array = np.array([1.0, 2.0, 7.0], masked=True)
>>>>> masked_arr[2] = np.NA
>>    TypeError('dtype does not support NA')
>>>>> masked_arr[2] = np.MASKED
>>    TypeError('float() argument must be a string or a number')
>>>>> masked_arr.visible[2] = False
>>>>> masked_arr
>>    array([1., 2., MASKED], masked=True)
>
> What about the reverse case ? When you assign a regular value to a np.NA/np.IGNORE item ?

Well, for the np.NA case, this is straightforward:

na_arr[2] = 3

It's just assignment. For ``masked_array[2] = 3`` - I don't know, I
guess whatever we are used to.  What do you think?

Best,

Matthew