[SciPy-User] bottleneck group_median

Keith Goodman kwgoodman at gmail.com
Wed Jan 12 10:26:39 EST 2011


On Tue, Jan 11, 2011 at 3:28 PM, Keith Goodman <kwgoodman at gmail.com> wrote:
> On Tue, Jan 11, 2011 at 3:02 PM, Hanno Klemm <klemm at phys.ethz.ch> wrote:
>
>> I am looking at the bottleneck package and the benchmarks are really
>> impressive. Would it be possible, or is it planned, to implement a
>> group_median, analogous to the group_mean?
>
> The focus of the next release of Bottleneck (v0.3) is moving window
> functions. I recently added move_min, move_max, move_nanmin,
> move_nanmax. (I don't yet know how to efficiently find a moving
> median. Anyone have suggestions for algorithms?)
>
> Bottleneck has a fast median function that can used to build a
> group_median function. I plan to work on the group functions in v0.4.
> Before adding functions (like group_median) I'd like to review the
> function signature of the group function. Suggestions welcomed.
>
> Any changes you'd like to see in the inputs/outputs of the group functions?
>
>> I am at the moment working on a project where I have to search through
>> large arrays and compute medians for certain values, which is quite
>> slow in conventional numpy/scipy. Therefore a group_median would be
>> absolutely fantastic for me.
>
> I think scipy.ndimage has the ability to do group functions. I haven't
> looked into it yet. If someone knows how, I'd like to see an example
> so that I can use it for benchmarking.
>
>> I would promise to contribute, but unfortunately my C skills are so
>> limited that I would really not be of much help in finite time.
>
> It's very helpful to have users, especially when they report problems
> or typos or suggestions.

BTW, bottleneck does have a slow, brute-force, generic group function
that I use for unit testing. You could combine that with a fast median
function to get a group median:

>> from bottleneck.slow.group import group_func
>> a = np.array([1,2,3,4,5])
>> label = ['a', 'b', 'a', 'b', 'b']
>> group_func(bn.median, a, label)
   (array([ 2.,  4.]), ['a', 'b'])



More information about the SciPy-User mailing list