[Numpy-discussion] NA/Missing Data Conference Call Summary

Wed Jul 6 15:38:40 EDT 2011

On Wed, Jul 6, 2011 at 11:38 AM, Christopher Barker
<Chris.Barker at noaa.gov>wrote:

> Christopher Jordan-Squire wrote:
> > If we follow those rules for IGNORE for all computations, we sometimes
> > get some weird output. For example:
> > [ [1, 2], [3, 4] ] * [ IGNORE, 7] = [ 15, 31 ]. (Where * is matrix
> > multiply and not * with broadcasting.) Or should that sort of operation
> > through an error?
>
> That should throw an error -- matrix computation is heavily influenced
> by the shape and size of matrices, so I think IGNORES really don't make
> sense there.
>
>
>
If the IGNORES don't make sense in basic numpy computations then I'm kinda
confused why they'd be included at the numpy core level.

> Nathaniel Smith wrote:
> > It's exactly this transparency that worries Matthew and me -- we feel
> > that the alterNEP preserves it, and the NEP attempts to erase it. In
> > the NEP, there are two totally different underlying data structures,
> > but this difference is blurred at the Python level. The idea is that
> > you shouldn't have to think about which you have, but if you work with
> > C/Fortran, then of course you do have to be constantly aware of the
> > underlying implementation anyway.
>
> I don't think this bothers me -- I think it's analogous to things in
> numpy like Fortran order and non-contiguous arrays -- you can ignore all
> that when working in pure python when performance isn't critical, but
> you need a deeper understanding if you want to work with the data in C
> or Fortran or to tune performance in python.
>
> So as long as there is an API to query and control how things work, I
> like that it's hidden from simple python code.
>
> -Chris
>
>
>
I'm similarly not too concerned about it. Performance seems finicky when
you're dealing with missing data, since a lot of arrays will likely have to
be copied over to other arrays containing only complete data before being
handed over to BLAS. My primary concern is that the np.NA stuff 'just
works'. Especially since I've never run into use cases in statistics where
the difference between IGNORE and NA mattered.

>
>
>
> --
> Christopher Barker, Ph.D.
> Oceanographer
>
> Emergency Response Division
> NOAA/NOS/OR&R            (206) 526-6959   voice
> 7600 Sand Point Way NE   (206) 526-6329   fax
> Seattle, WA  98115       (206) 526-6317   main reception
>
> Chris.Barker at noaa.gov
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20110706/53e46860/attachment.html>