[SciPy-User] bottleneck group_median

Hanno Klemm klemm at phys.ethz.ch
Wed Jan 12 18:19:19 EST 2011


Am 12.01.2011 um 16:26 schrieb Keith Goodman:

> On Tue, Jan 11, 2011 at 3:28 PM, Keith Goodman <kwgoodman at gmail.com>  
> wrote:
>> On Tue, Jan 11, 2011 at 3:02 PM, Hanno Klemm <klemm at phys.ethz.ch>  
>> wrote:
>>
>>> I am looking at the bottleneck package and the benchmarks are really
>>> impressive. Would it be possible, or is it planned, to implement a
>>> group_median, analogous to the group_mean?
>>
>> The focus of the next release of Bottleneck (v0.3) is moving window
>> functions. I recently added move_min, move_max, move_nanmin,
>> move_nanmax. (I don't yet know how to efficiently find a moving
>> median. Anyone have suggestions for algorithms?)
>>
>> Bottleneck has a fast median function that can used to build a
>> group_median function. I plan to work on the group functions in v0.4.
>> Before adding functions (like group_median) I'd like to review the
>> function signature of the group function. Suggestions welcomed.
>>
>> Any changes you'd like to see in the inputs/outputs of the group  
>> functions?
>>
>>> I am at the moment working on a project where I have to search  
>>> through
>>> large arrays and compute medians for certain values, which is quite
>>> slow in conventional numpy/scipy. Therefore a group_median would be
>>> absolutely fantastic for me.
>>
>> I think scipy.ndimage has the ability to do group functions. I  
>> haven't
>> looked into it yet. If someone knows how, I'd like to see an example
>> so that I can use it for benchmarking.
>>
>>> I would promise to contribute, but unfortunately my C skills are so
>>> limited that I would really not be of much help in finite time.
>>
>> It's very helpful to have users, especially when they report problems
>> or typos or suggestions.
>
> BTW, bottleneck does have a slow, brute-force, generic group function
> that I use for unit testing. You could combine that with a fast median
> function to get a group median:
>


Thanks for the suggstion. At the moment I am playing around with using  
the group_mapper function to get a dictionary of the values that I  
need and then shoving the subselected array to the median function.

I will test which approach is faster.

Hanno



>>> from bottleneck.slow.group import group_func
>>> a = np.array([1,2,3,4,5])
>>> label = ['a', 'b', 'a', 'b', 'b']
>>> group_func(bn.median, a, label)
>   (array([ 2.,  4.]), ['a', 'b'])
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>




More information about the SciPy-User mailing list