[SciPy-User] Moving median code

Wes McKinney wesmckinn at gmail.com
Mon Apr 25 12:27:48 EDT 2011


On Mon, Apr 25, 2011 at 12:25 PM, J. David Lee <johnl at cs.wisc.edu> wrote:
> On 04/25/2011 11:01 AM, Keith Goodman wrote:
>> On Mon, Apr 25, 2011 at 8:31 AM, J. David Lee<johnl at cs.wisc.edu>  wrote:
>>> In working on a project recently, I wrote a moving median code that is
>>> about 10x faster than scipy.ndimage.median_filter. It utilizes a linked
>>> list structure to store values and track the median value. If anyone is
>>> interested, I've posted the code here (about 150 lines):
>>>
>>> http://pages.cs.wisc.edu/~johnl/median_code/median_code.c
>> I'm looking to add a moving window median function to the bottleneck
>> package [1].
>>
>> So your code takes 1d input where the window size is the length of the
>> input and then you move a step forward by dropping the oldest data
>> point and adding a new one and then update the median? Just making
>> sure I understand.
>>
>> If that's a good fit for bottleneck I could add the ability to take nd
>> input, unit tests, and docstrings---assuming I could figure out how to
>> wrap it in Cython.
>>
>> Does anyone have a feel for how the speed of a linked list approach
>> would compare to a double heap? Code for the double heap moving window
>> median is in [2]. I also wonder which would be easier to wrap in
>> Cython.
>>
>> [1] https://github.com/kwgoodman/bottleneck
>> [2] http://home.tiac.net/~cri_a/source_code/winmedian
>> _______________________________________________
>> SciPy-User mailing list
>> SciPy-User at scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-user
> Your understanding is correct, although I'm reusing the oldest node:
> moving it to the beginning of the list (by index), updating its value,
> and then moving it to the appropriate place in the value list using its
> new value. Any time two nodes are swapped I'm checking if one is the
> median value, and then swapping the median pointer if it is.
>
> Thanks for your interest,
>
> David
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>

I wonder how this compares speedwise with the indexable skiplist approach:

http://rhettinger.wordpress.com/2010/02/06/lost-knowledge/

I converted to Cython and put in pandas here last year:

https://github.com/wesm/pandas/blob/master/pandas/lib/src/skiplist.pyx
https://github.com/wesm/pandas/blob/master/pandas/lib/src/moments.pyx#L442

This code definitely has "too much Python", I think, so a pure C
version would be much faster.



More information about the SciPy-User mailing list