[Numpy-discussion] Proposal to add `weights` to `np.percentile` and `np.median`

josef.pktd at gmail.com josef.pktd at gmail.com
Tue Feb 16 14:39:42 EST 2016


On Tue, Feb 16, 2016 at 1:41 PM, Joseph Fox-Rabinovitz <
jfoxrabinovitz at gmail.com> wrote:

> Thanks for pointing me to that. I had something a bit different in
> mind but that definitely looks like a good start.
>
> On Tue, Feb 16, 2016 at 1:32 PM, Antony Lee <antony.lee at berkeley.edu>
> wrote:
> > See earlier discussion here: https://github.com/numpy/numpy/issues/6326
> > Basically, naïvely sorting may be faster than a not-so-optimized version
> of
> > quickselect.
> >
> > Antony
> >
> > 2016-02-15 21:49 GMT-08:00 Joseph Fox-Rabinovitz <
> jfoxrabinovitz at gmail.com>:
> >>
> >> I would like to add a `weights` keyword to `np.partition`,
> >> `np.percentile` and `np.median`. My reason for doing so is to to allow
> >> `np.histogram` to process automatic bin selection with weights.
> >> Currently, weights are not supported for the automatic bin selection
> >> and would be difficult to support in `auto` mode without having
> >> `np.percentile` support a `weights` keyword. I suspect that there are
> >> many other uses for such a feature.
> >>
> >> I have taken a preliminary look at the C implementation of the
> >> partition functions that are the basis for `partition`, `median` and
> >> `percentile`. I think that it would be possible to add versions (or
> >> just extend the functionality of existing ones) that check the ratio
> >> of the weights below the partition point to the total sum of the
> >> weights instead of just counting elements.
> >>
> >> One of the main advantages of such an implementation is that it would
> >> allow any real weights to be handled correctly, not just integers.
> >> Complex weights would not be supported.
> >>
> >> The purpose of this email is to see if anybody objects, has ideas or
> >> cares at all about this proposal before I spend a significant amount
> >> of time working on it. For example, did I miss any functions in my
> >> list?
> >>
> >> Regards,
> >>
> >>     -Joe
> >> _______________________________________________
> >> NumPy-Discussion mailing list
> >> NumPy-Discussion at scipy.org
> >> https://mail.scipy.org/mailman/listinfo/numpy-discussion
> >
> >
> >
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion at scipy.org
> > https://mail.scipy.org/mailman/listinfo/numpy-discussion
> >
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>


statsmodels just got weighted quantiles
https://github.com/statsmodels/statsmodels/pull/2707

I didn't try to figure out it's computational efficiency, and we would
gladly delegate to whatever fast algorithm would be in numpy.

Josef
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20160216/71035c44/attachment.html>


More information about the NumPy-Discussion mailing list