[Numpy-discussion] NA/Missing Data Conference Call Summary
Bruce Southey
bsouthey at gmail.com
Wed Jul 6 16:11:37 EDT 2011
On 07/06/2011 02:38 PM, Christopher Jordan-Squire wrote:
>
>
> On Wed, Jul 6, 2011 at 11:38 AM, Christopher Barker
> <Chris.Barker at noaa.gov <mailto:Chris.Barker at noaa.gov>> wrote:
>
> Christopher Jordan-Squire wrote:
> > If we follow those rules for IGNORE for all computations, we
> sometimes
> > get some weird output. For example:
> > [ [1, 2], [3, 4] ] * [ IGNORE, 7] = [ 15, 31 ]. (Where * is matrix
> > multiply and not * with broadcasting.) Or should that sort of
> operation
> > through an error?
>
> That should throw an error -- matrix computation is heavily influenced
> by the shape and size of matrices, so I think IGNORES really don't
> make
> sense there.
>
>
>
> If the IGNORES don't make sense in basic numpy computations then I'm
> kinda confused why they'd be included at the numpy core level.
>
> Nathaniel Smith wrote:
> > It's exactly this transparency that worries Matthew and me -- we
> feel
> > that the alterNEP preserves it, and the NEP attempts to erase it. In
> > the NEP, there are two totally different underlying data structures,
> > but this difference is blurred at the Python level. The idea is that
> > you shouldn't have to think about which you have, but if you
> work with
> > C/Fortran, then of course you do have to be constantly aware of the
> > underlying implementation anyway.
>
> I don't think this bothers me -- I think it's analogous to things in
> numpy like Fortran order and non-contiguous arrays -- you can
> ignore all
> that when working in pure python when performance isn't critical, but
> you need a deeper understanding if you want to work with the data in C
> or Fortran or to tune performance in python.
>
> So as long as there is an API to query and control how things work, I
> like that it's hidden from simple python code.
>
> -Chris
>
>
>
> I'm similarly not too concerned about it. Performance seems finicky
> when you're dealing with missing data, since a lot of arrays will
> likely have to be copied over to other arrays containing only complete
> data before being handed over to BLAS. My primary concern is that the
> np.NA stuff 'just works'. Especially since I've never run into use
> cases in statistics where the difference between IGNORE and NA mattered.
>
>
Exactly!
I have not been able to think of an real example where that difference
matters as the calculations are only on the 'valid' (ie non-missing and
non-masked) values.
Bruce
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20110706/40179a5d/attachment.html>
More information about the NumPy-Discussion
mailing list