[Numpy-discussion] feedback request: proposal to add masks to the core ndarray

Fri Jun 24 12:05:18 EDT 2011

On Fri, Jun 24, 2011 at 8:14 AM, Robert Kern <robert.kern at gmail.com> wrote:
> On Fri, Jun 24, 2011 at 10:07, Laurent Gautier <lgautier at gmail.com> wrote:
>> May be there is not so much need for reservation over the string NA, when
>> making the distinction between:
>> a- the internal representation of a "missing string" (what is stored in
>> memory, and that C-level code would need to be aware of)
>> b- the 'external' representation of a missing string (in Python, what would
>> be returned by repr() )
>> c- what is assumed to be a missing string value when reading from a file.
>>
>> a/ is not 'NA', c/ should be a parameter in the relevant functions, b/ can
>> be configured as a module-level, class-level, or instance-level variable.
>
> In R, a/ happens to be 'NA', unfortunately. :-/
>
> I'm not really sure how they handle datasets that use valid 'NA'
> values. Presumably, their input routines allow one to convert such
> values to something else such that it can use 'NA'==NA internally.

No, R can distinguish the string "NA" and the value NA-of-type-string:

> c("NA", NA)
[1] "NA" NA

In R strings are represented as pointers, rather than in-place, and
the magic NA value has a special globally known pointer value. (This
pointer might well point to the characters "NA\0", but all of the code
knows to check whether it has the magic NA pointer before actually
following the pointer.)

-- Nathaniel