[Numpy-discussion] New functions.

Tue May 31 23:41:46 EDT 2011

On Tue, May 31, 2011 at 8:50 PM, Bruce Southey <bsouthey at gmail.com> wrote:

> On Tue, May 31, 2011 at 9:26 PM, Charles R Harris
> <charlesr.harris at gmail.com> wrote:
> >
> >
> > On Tue, May 31, 2011 at 8:00 PM, Skipper Seabold <jsseabold at gmail.com>
> > wrote:
> >>
> >> On Tue, May 31, 2011 at 9:53 PM, Warren Weckesser
> >> <warren.weckesser at enthought.com> wrote:
> >> >
> >> >
> >> > On Tue, May 31, 2011 at 8:36 PM, Skipper Seabold <jsseabold at gmail.com
> >
> >> > wrote:
> >> >> I don't know if it's one pass off the top of my head, but I've used
> >> >> percentile for interpercentile ranges.
> >> >>
> >> >> [docs]
> >> >> [1]: X = np.random.random(1000)
> >> >>
> >> >> [docs]
> >> >> [2]: np.percentile(X,[0,100])
> >> >> [2]: [0.00016535235312509222, 0.99961513543316571]
> >> >>
> >> >> [docs]
> >> >> [3]: X.min(),X.max()
> >> >> [3]: (0.00016535235312509222, 0.99961513543316571)
> >> >>
> >> >
> >> >
> >> > percentile() isn't one pass; using percentile like that is much
> slower:
> >> >
> >> > In [25]: %timeit np.percentile(X,[0,100])
> >> > 10000 loops, best of 3: 103 us per loop
> >> >
> >> > In [26]: %timeit X.min(),X.max()
> >> > 100000 loops, best of 3: 11.8 us per loop
> >> >
> >>
> >> Probably should've checked that before opening my mouth. Never
> >> actually used it for a minmax, but it is faster than two calls to
> >> scipy.stats.scoreatpercentile. Guess I'm +1 to fast order statistics.
> >>
> >
> > So far the biggest interest seems to be in order statistics of various
> > sorts, so to speak.
> >
> > Order Statistics
> >
> > minmax
> > median
> > k'th element
> > largest/smallest k elements
> >
> > Other Statistics
> >
> > mean/std
> >
> > Nan functions
> >
> > nanadd
> >
> > Chuck
> >
> >
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion at scipy.org
> > http://mail.scipy.org/mailman/listinfo/numpy-discussion
> >
> >
>
> How about including all or some of Keith's Bottleneck package?
> He has tried to include some of the discussed functions and tried to
> make them very fast.
>

I don't think they are sufficiently general as they are limited to 2
dimensions. However, I think the moving filters should go into scipy, either
in ndimage or maybe signals. Some of the others we can still speed of
significantly, for instance nanmedian, by using the new functionality in
numpy, i.e., numpy sort has worked with nans for a while now. It looks like
call overhead dominates the nanmax times for small arrays and this might
improve if the ufunc machinery is cleaned up a bit more, I don't know how
far Mark got with that.

One bit of infrastructure that could be generally helpful is low-level
support for masked arrays, but that is a larger topic.

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20110531/7f7e7c59/attachment.html>