[Numpy-discussion] NA/Missing Data Conference Call Summary

Wed Jul 6 16:11:37 EDT 2011

On 07/06/2011 02:38 PM, Christopher Jordan-Squire wrote:
>
>
> On Wed, Jul 6, 2011 at 11:38 AM, Christopher Barker 
> <Chris.Barker at noaa.gov <mailto:Chris.Barker at noaa.gov>> wrote:
>
>     Christopher Jordan-Squire wrote:
>     > If we follow those rules for IGNORE for all computations, we
>     sometimes
>     > get some weird output. For example:
>     > [ [1, 2], [3, 4] ] * [ IGNORE, 7] = [ 15, 31 ]. (Where * is matrix
>     > multiply and not * with broadcasting.) Or should that sort of
>     operation
>     > through an error?
>
>     That should throw an error -- matrix computation is heavily influenced
>     by the shape and size of matrices, so I think IGNORES really don't
>     make
>     sense there.
>
>
>
> If the IGNORES don't make sense in basic numpy computations then I'm 
> kinda confused why they'd be included at the numpy core level.
>
>     Nathaniel Smith wrote:
>     > It's exactly this transparency that worries Matthew and me -- we
>     feel
>     > that the alterNEP preserves it, and the NEP attempts to erase it. In
>     > the NEP, there are two totally different underlying data structures,
>     > but this difference is blurred at the Python level. The idea is that
>     > you shouldn't have to think about which you have, but if you
>     work with
>     > C/Fortran, then of course you do have to be constantly aware of the
>     > underlying implementation anyway.
>
>     I don't think this bothers me -- I think it's analogous to things in
>     numpy like Fortran order and non-contiguous arrays -- you can
>     ignore all
>     that when working in pure python when performance isn't critical, but
>     you need a deeper understanding if you want to work with the data in C
>     or Fortran or to tune performance in python.
>
>     So as long as there is an API to query and control how things work, I
>     like that it's hidden from simple python code.
>
>     -Chris
>
>
>
> I'm similarly not too concerned about it. Performance seems finicky 
> when you're dealing with missing data, since a lot of arrays will 
> likely have to be copied over to other arrays containing only complete 
> data before being handed over to BLAS. My primary concern is that the 
> np.NA stuff 'just works'. Especially since I've never run into use 
> cases in statistics where the difference between IGNORE and NA mattered.
>
>
Exactly!
I have not been able to think of an real example where that difference 
matters as the calculations are only on the 'valid' (ie non-missing and 
non-masked) values.

Bruce


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20110706/40179a5d/attachment.html>