[SciPy-User] Moving median code

Wes McKinney wesmckinn at gmail.com
Mon Apr 25 12:41:37 EDT 2011


On Mon, Apr 25, 2011 at 12:27 PM, Wes McKinney <wesmckinn at gmail.com> wrote:
> On Mon, Apr 25, 2011 at 12:25 PM, J. David Lee <johnl at cs.wisc.edu> wrote:
>> On 04/25/2011 11:01 AM, Keith Goodman wrote:
>>> On Mon, Apr 25, 2011 at 8:31 AM, J. David Lee<johnl at cs.wisc.edu>  wrote:
>>>> In working on a project recently, I wrote a moving median code that is
>>>> about 10x faster than scipy.ndimage.median_filter. It utilizes a linked
>>>> list structure to store values and track the median value. If anyone is
>>>> interested, I've posted the code here (about 150 lines):
>>>>
>>>> http://pages.cs.wisc.edu/~johnl/median_code/median_code.c
>>> I'm looking to add a moving window median function to the bottleneck
>>> package [1].
>>>
>>> So your code takes 1d input where the window size is the length of the
>>> input and then you move a step forward by dropping the oldest data
>>> point and adding a new one and then update the median? Just making
>>> sure I understand.
>>>
>>> If that's a good fit for bottleneck I could add the ability to take nd
>>> input, unit tests, and docstrings---assuming I could figure out how to
>>> wrap it in Cython.
>>>
>>> Does anyone have a feel for how the speed of a linked list approach
>>> would compare to a double heap? Code for the double heap moving window
>>> median is in [2]. I also wonder which would be easier to wrap in
>>> Cython.
>>>
>>> [1] https://github.com/kwgoodman/bottleneck
>>> [2] http://home.tiac.net/~cri_a/source_code/winmedian
>>> _______________________________________________
>>> SciPy-User mailing list
>>> SciPy-User at scipy.org
>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>> Your understanding is correct, although I'm reusing the oldest node:
>> moving it to the beginning of the list (by index), updating its value,
>> and then moving it to the appropriate place in the value list using its
>> new value. Any time two nodes are swapped I'm checking if one is the
>> median value, and then swapping the median pointer if it is.
>>
>> Thanks for your interest,
>>
>> David
>> _______________________________________________
>> SciPy-User mailing list
>> SciPy-User at scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>
>
> I wonder how this compares speedwise with the indexable skiplist approach:
>
> http://rhettinger.wordpress.com/2010/02/06/lost-knowledge/
>
> I converted to Cython and put in pandas here last year:
>
> https://github.com/wesm/pandas/blob/master/pandas/lib/src/skiplist.pyx
> https://github.com/wesm/pandas/blob/master/pandas/lib/src/moments.pyx#L442
>
> This code definitely has "too much Python", I think, so a pure C
> version would be much faster.
>

I took a peak at the C code-- looks like it's O(N * W). Maybe we
should invest in a pure C O(N log W) approach using the skiplist?



More information about the SciPy-User mailing list