[Numpy-discussion] Weighted percentile / quantile

Alex Rogozhnikov alex.rogozhnikov at yandex.ru
Wed Mar 2 07:20:36 EST 2016


Hi, Joe, 
> I am working (slowly) on upgrading the C code for partitioning with
> arbitrary arrays of real weights
really good to know there is some work in this direction. 

02 марта 2016 г., в 6:27, Joseph Fox-Rabinovitz <jfoxrabinovitz at gmail.com> написал(а):

> Alex,
> 
> At the moment, there does not appear to be anything in numpy. However,
> I am working (slowly) on upgrading the C code for partitioning with
> arbitrary arrays of real weights. That will get `partition`, `median`,
> `percentile` to work with weights, as well as enabling weights for the
> automated bin estimators of `histogram`. `mean` already has an
> implementation of weights via `average`.
> 
> You may be interested in my original post to the mailing list here:
> https://mail.scipy.org/pipermail/numpy-discussion/2016-February/075000.html.
> Josef P. mentioned in one of his responses that statsmodels has a
> weighted quantile computation available as of PR 2707:
> https://github.com/statsmodels/statsmodels/pull/2707. That should
> effectively serve your purpose.

It’s the same sort+cumsum approach, and even worse because relies on aggregating.
Thanks for letting know, but I’ll definitely prefer implementation from SO (till numpy will support weights).

Cheers, 
Alex
> 
>    -Joe
> 
> 
> On Tue, Mar 1, 2016 at 6:03 PM, Alex Rogozhnikov
> <alex.rogozhnikov at yandex.ru> wrote:
>> Hi,
>> I know the topic was already raised a long ago:
>> https://mail.scipy.org/pipermail/numpy-discussion/2010-July/051851.html
>> 
>> There are also several questions on SO:
>> http://stackoverflow.com/questions/20601872/numpy-or-scipy-to-calculate-weighted-median
>> http://stackoverflow.com/questions/13546146/percentile-calculation-with-weighted-data
>> http://stackoverflow.com/questions/26102867/python-weighted-median-algorithm-with-pandas
>> 
>> The only working solution with numpy:
>> http://stackoverflow.com/questions/21844024/weighted-percentile-using-numpy
>> uses sorting.
>> 
>> Are there better options at the moment (numpy/scipy/pandas)?
>> 
>> Cheers,
>> Alex.
>> 
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at scipy.org
>> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>> 
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion




More information about the NumPy-Discussion mailing list