[Numpy-discussion] feedback request: proposal to add masks to the core ndarray

Pierre GM pgmdevlist at gmail.com
Fri Jun 24 11:02:01 EDT 2011


On Jun 24, 2011, at 4:44 PM, Robert Kern wrote:

> On Fri, Jun 24, 2011 at 09:35, Robert Kern <robert.kern at gmail.com> wrote:
>> On Fri, Jun 24, 2011 at 09:24, Keith Goodman <kwgoodman at gmail.com> wrote:
>>> On Fri, Jun 24, 2011 at 7:06 AM, Robert Kern <robert.kern at gmail.com> wrote:
>>> 
>>>> The alternative proposal would be to add a few new dtypes that are
>>>> NA-aware. E.g. an nafloat64 would reserve a particular NaN value
>>>> (there are lots of different NaN bit patterns, we'd just reserve one)
>>>> that would represent NA. An naint32 would probably reserve the most
>>>> negative int32 value (like R does). Using the NA-aware dtypes signals
>>>> that you are using NA values; there is no need for an additional flag.
>>> 
>>> I don't understand the numpy design and maintainable issues, but from
>>> a user perspective (mine) nafloat64, etc sounds nice.
>> 
>> It's worth noting that this is not a replacement for masked arrays,
>> nor is it intended to be the be-all, end-all solution to missing data
>> problems. It's mostly just intended to be a focused tool to fill in
>> the gaps where masked arrays are less convenient for whatever reason;
>> e.g. where you're tempted to (ab)use NaNs for the purpose and the
>> limitations on the range of values is acceptable. Not every dtype
>> would have an NA-aware counterpart. I would suggest just nabool,
>> nafloat64, naint32, nastring (a little tricky due to the flexible
>> size, but doable), and naobject. Maybe a couple more, if we get
>> requests, like naint64 and nacomplex128.
> 
> Oh, and nadatetime64 and natimedelta64.

So, if I understand correctly:
if my array has a nafloat type, it's an array that supports missing values and it will always have a mask, right ? And just viewing an array as a nafloat dtyped one would make it an 'array-with-missing-values' ? That's pretty elegant. I like that.
Now, how will masked values represented ? Different masked values from one dtype to another ? What would be the equivalent of something like `if a[0] is masked` that we have know?


More information about the NumPy-Discussion mailing list