[Numpy-discussion] [ANN] Nanny, faster NaN functions

Wes McKinney wesmckinn at gmail.com
Sun Nov 21 21:03:22 EST 2010


On Sun, Nov 21, 2010 at 6:37 PM, Keith Goodman <kwgoodman at gmail.com> wrote:
> On Sun, Nov 21, 2010 at 3:16 PM, Wes McKinney <wesmckinn at gmail.com> wrote:
>
>> What would you say to a single package that contains:
>>
>> - NaN-aware NumPy and SciPy functions (nanmean, nanmin, etc.)
>
> I'd say yes.
>
>> - moving window functions (moving_{count, sum, mean, var, std, etc.})
>
> Yes.
>
> BTW, we both do arr=arr.astype(float), I think, before doing the
> moving statistics. So I speeded things up by running the moving window
> backwards and writing the result in place.
>
>> - core subroutines for labeled data
>
> Not sure what this would be. Let's discuss.

Basically want to produce a indexing vector based on rules-- something
to pass to ndarray.take later on. And maybe your generic binary-op
function from a while back?

>> - group-by functions
>
> Yes. I have some ideas on function signatures.
>
>> - other things to add to this list?
>
> A no-op function with a really long doc string!
>
>> In other words, basic building computational tools for making
>> libraries like larry, pandas, etc. and doing time series / statistical
>> / other manipulations on real world (messy) data sets. The focus isn't
>> so much "NaN-awareness" per se but more practical "data wrangling". I
>> would be happy to work on such a package and to move all the Cython
>> code I've written into it. There's a little bit of datarray overlap
>> potentially but I think that's OK
>
> Maybe we should make a list of function signatures along with brief
> doc strings to get a feel for what we (and hopefully others) have in
> mind?

I've personally never been much for writing specs, but could be
useful. We probably aren't going to get it all right on the first try,
so we'll just do our best and refactor the code later if necessary. We
might be well-served by collecting exemplary data sets and making a
list of things we would like to be able to do easily with that data.

But writing stuff like:

moving_{funcname}(ndarray data, int window, int axis=0, int
min_periods=window) -> ndarray
group_aggregate(ndarray data, ndarray labels, int axis=0, function
agg_function) -> ndarray
group_transform(...) ...

etc. makes sense

> Where should we continue the discussion? The pystatsmodels mailing
> list? By now the numpy list probably thinks of NaN as "Not ANother"
> email from this guy.
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>

Maybe let's have the next thread on SciPy-user-- I think what we're
talking about is general enough to be discussed there.



More information about the NumPy-Discussion mailing list