[SciPy-Dev] views and mask NA

Benjamin Root ben.root at ou.edu
Sat Jan 21 14:49:56 EST 2012


On Fri, Jan 20, 2012 at 10:21 PM, Charles R Harris <
charlesr.harris at gmail.com> wrote:

> Hi All,
>
> I'd like some feedback on how mask NA should interact with views. The
> immediate problem is how to deal with the real and imaginary parts of
> complex numbers. If the original has a masked value, it should show up as
> masked in the real and imaginary parts. But what should happen on
> assignment to one of the masked views? This should probably clear the NA in
> the real/imag part, but not in the complex original.


That's a very sticky question.  If one were to clear the NA on both the
real and imaginary parts, we run the risk of possibly exposing
uninitialized data.  Remember, depending on how we finally decide how math
is done with NA, creating a new array from some operations that had masks
may not compute any value for those masked elements.  So, if we assign to
the real part and, therefore, clear that mask, the imaginary part may just
be random bits.

Conversely, if we were to keep the imaginary part masked, does that still
make sense for mathematical operations?  Say, perhaps, magnitudes or
fourier transforms?  Would it make sense to instead clear the mask on both
real and imaginary parts and merely assume as assigning to the real part
implicitly means a zero assignment to the imaginary part (and vice-versa).
Mathematically, this makes sense to me since it would be equivalent, but as
a programmer, this thought makes me cringe. Consider making an assignment
first to the real part, and then to the imaginary part, the second
assignment would wipe out the first (if we want to be consistent).

Are there use cases for separately making assignments to the real and
imaginary parts? Would we want the zero assignment to happen *only* if
there was a mask, but not if there wasn't a mask?  This gets very icky,
indeed.



> However, that does allow touching things under the mask, so to speak.
>
>
Remember, some forms of missingness that we have discussed allows for
"unmasking", while other forms do not.  However, currently, the NEP does
not allow for touching things under the mask, IIRC.



> Things get more complicated if the complex original is viewed as reals. In
> this case the mask needs to be "doubled" up, and there is again the
> possibility of touching things beneath the mask in the original. Viewing
> the original as bytes leads to even greater duplication.
>
>
Let's also think of it in the other direction. Let's say I have an array of
32-bit ints and I view them as 64-bit ints.  This is what currently happens:

>>> a = np.array([1, 2, 3, np.NA, 5, 6, 7, 8, 9, 10], dtype='i4')
>>> a.view('i8')
array([8589934593,           3, 25769803781, NA, 42949672969], dtype=int64)
>>> a = np.array([1, 2, np.NA, 4, 5, 6, 7, 8, 9, 10], dtype='i4')
>>> a.view('i8')
array([8589934593, 17179869206, NA, 34359738375, 42949672969], dtype=int64)

Depending on the position of the NA, the view may or may not get the NA.  I
would imagine that this is also endian-dependent. I am not entirely certain
of what the correct behavior should be, but I think the answer to this is
also related to the answer to the real/imaginary case.


> My thought is that touching the underlying data needs to be allowed in
> these cases, but the original mask can only be cleared by assignment to the
> original. Thoughts?
>
>
Such a restriction would likely prove problematic.  When we create
functions and other libraries, we are not aware of whether we are dealing
with a view of an array or the original.  Heck, most of the time, I am not
paying attention to whether I am using a view or not in my own programs.
The transparency of views has been a major selling point to me for numpy.
Eventually, (my understanding is that) views will become completely
indistinguishable from the original numpy array in all of the remaining
corner cases (boolean assignments and such).

If we decide to make NA-related assignments different for views than
originals, then it only increases the contrast between numpy arrays and
views.  In a language like Python, this would likely be a bad thing.

Unfortunately, I am not sure of what should be the solution.  But I hope
this spurs further discussion.

Cheers,
Ben Root
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20120121/feab8efe/attachment.html>


More information about the SciPy-Dev mailing list