[Numpy-discussion] NA masks for NumPy are ready to test

Mark Wiebe mwwiebe at gmail.com
Fri Aug 19 15:15:05 EDT 2011


On Fri, Aug 19, 2011 at 11:37 AM, Bruce Southey <bsouthey at gmail.com> wrote:

> Hi,
> Just some immediate minor observations that are really about trying to
> be consistent:
>
> 1) Could you keep the display of the NA dtype be the same as the array?
> For example, NA dtype is displayed as '<f8' but should be displayed as
> 'float64' as that is the array dtype.
>  >>> a=np.array([[1,2,3,np.NA], [3,4,np.nan,5]])
> >>> a
> array([[  1.,   2.,   3., NA],
>       [  3.,   4.,  nan,   5.]])
> >>> a.dtype
> dtype('float64')
> >>> a.sum()
> NA(dtype='<f8')
>

I suppose I can do it that way, sure. I think it would be good to change the
'float64' into '<float64' at some point, so it's a more portable repr.


> 2) Can the 'skipna' flag be added to the methods?
> >>> a.sum(skipna=True)
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
> TypeError: 'skipna' is an invalid keyword argument for this function
> >>> np.sum(a,skipna=True)
> nan
>

Yeah, but I think this is low priority compared to a lot of other things
that need doing. The methods are written in C with a particular hardcoded
implementation pattern, whereas with the functions in the numpy namespace I
was able to adjust to call the ufunc reduce methods without much menial
effort.

3) Can the skipna flag be extended to exclude other non-finite cases like
> NaN?
>

That wasn't really within the scope of the original design, except for one
particular case of the NA-bitpattern dtypes. It's possible to make a new
mask and assign NA to the NaN values like this:

a = [array with NaNs]
aview = a.view(ownmaskna=True)
aview[np.isnan(aview)] = np.NA
np.sum(aview, skipna=True)

4) Assigning a np.NA needs a better error message but the Integer
> array case is more informative:
> >>> b=np.array([1,2,3,4], dtype=np.float128)
> >>> b[0]=np.NA
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
> TypeError: float() argument must be a string or a number
>
> >>> j=np.array([1,2,3])
> >>> j
> array([1, 2, 3])
> >>> j[0]=ina
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
> TypeError: int() argument must be a string or a number, not 'numpy.NAType'
>

I coded this up the way I did to ease the future transition to NA-bitpattern
dtypes, which would handle this conversion from the NA object. The error
message is being produced by CPython in both of these cases, so it looks
like they didn't make their messages consistent.

This could be changed to match the error message like this:

>>> a = np.array([np.NA, 3])
>>> b = np.array([3,4])
>>> b[...] = a
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: Cannot assign NA value to an array which does not support NAs


> But it is nice that np.NA 'adjusts' to the insertion array:
> >>> b.flags.maskna = True
> >>> ana
> NA(dtype='<f8')
> >>> b[0]=ana
> >>> b[0]
> NA(dtype='<f16')
>

It should generally follow the NumPy type promotion rules, but may be a bit
more liberal in places.


> 5) Different display depending on masked state. That is I think that
> 'maskna=True' should be displayed always when flags.maskna is True :
> >>> j=np.array([1,2,3], dtype=np.int8)
> >>> j
> array([1, 2, 3], dtype=int8)
> >>> j.flags.maskna=True
> >>> j
> array([1, 2, 3], maskna=True, dtype=int8)
> >>> j[0]=np.NA
> >>> j
> array([NA, 2, 3], dtype=int8) # Ithink it should still display
> 'maskna=True'.
>

This is just like how NumPy hides the dtype in some cases, it's hiding the
maskna=True whenever it would be automatically detected from the input list.

>>> np.array([1.0, 2.0])
array([ 1.,  2.])
>>> np.array([1.0, 2.0], dtype=np.float32)
array([ 1.,  2.], dtype=float32)

Cheers,
Mark


>
> Bruce
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20110819/1a725f10/attachment.html>


More information about the NumPy-Discussion mailing list