[Numpy-discussion] feedback request: proposal to add masks to the core ndarray

Fri Jun 24 12:07:22 EDT 2011

On Fri, Jun 24, 2011 at 11:05, Nathaniel Smith <njs at pobox.com> wrote:
> On Fri, Jun 24, 2011 at 8:14 AM, Robert Kern <robert.kern at gmail.com> wrote:
>> On Fri, Jun 24, 2011 at 10:07, Laurent Gautier <lgautier at gmail.com> wrote:
>>> May be there is not so much need for reservation over the string NA, when
>>> making the distinction between:
>>> a- the internal representation of a "missing string" (what is stored in
>>> memory, and that C-level code would need to be aware of)
>>> b- the 'external' representation of a missing string (in Python, what would
>>> be returned by repr() )
>>> c- what is assumed to be a missing string value when reading from a file.
>>>
>>> a/ is not 'NA', c/ should be a parameter in the relevant functions, b/ can
>>> be configured as a module-level, class-level, or instance-level variable.
>>
>> In R, a/ happens to be 'NA', unfortunately. :-/
>>
>> I'm not really sure how they handle datasets that use valid 'NA'
>> values. Presumably, their input routines allow one to convert such
>> values to something else such that it can use 'NA'==NA internally.
>
> No, R can distinguish the string "NA" and the value NA-of-type-string:
>
>> c("NA", NA)
> [1] "NA" NA
>
> In R strings are represented as pointers, rather than in-place, and
> the magic NA value has a special globally known pointer value. (This
> pointer might well point to the characters "NA\0", but all of the code
> knows to check whether it has the magic NA pointer before actually
> following the pointer.)

Ah, okay. Well, then we can pick whatever value we like.

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
  -- Umberto Eco