[SciPy-User] Moving median code

Keith Goodman kwgoodman at gmail.com
Mon Apr 25 13:03:26 EDT 2011


On Mon, Apr 25, 2011 at 9:41 AM, Wes McKinney <wesmckinn at gmail.com> wrote:
> On Mon, Apr 25, 2011 at 12:27 PM, Wes McKinney <wesmckinn at gmail.com> wrote:
>> On Mon, Apr 25, 2011 at 12:25 PM, J. David Lee <johnl at cs.wisc.edu> wrote:
>>> On 04/25/2011 11:01 AM, Keith Goodman wrote:
>>>> On Mon, Apr 25, 2011 at 8:31 AM, J. David Lee<johnl at cs.wisc.edu>  wrote:
>>>>> In working on a project recently, I wrote a moving median code that is
>>>>> about 10x faster than scipy.ndimage.median_filter. It utilizes a linked
>>>>> list structure to store values and track the median value. If anyone is
>>>>> interested, I've posted the code here (about 150 lines):
>>>>>
>>>>> http://pages.cs.wisc.edu/~johnl/median_code/median_code.c
>>>>
>>>> So your code takes 1d input where the window size is the length of the
>>>> input and then you move a step forward by dropping the oldest data
>>>> point and adding a new one and then update the median? Just making
>>>> sure I understand.
>>>
>>> Your understanding is correct, although I'm reusing the oldest node:
>>> moving it to the beginning of the list (by index), updating its value,
>>> and then moving it to the appropriate place in the value list using its
>>> new value. Any time two nodes are swapped I'm checking if one is the
>>> median value, and then swapping the median pointer if it is.
>>
>> I wonder how this compares speedwise with the indexable skiplist approach:
>>
>> http://rhettinger.wordpress.com/2010/02/06/lost-knowledge/
>>
>> I converted to Cython and put in pandas here last year:
>>
>> https://github.com/wesm/pandas/blob/master/pandas/lib/src/skiplist.pyx
>> https://github.com/wesm/pandas/blob/master/pandas/lib/src/moments.pyx#L442
>>
>> This code definitely has "too much Python", I think, so a pure C
>> version would be much faster.
>
> I took a peak at the C code-- looks like it's O(N * W). Maybe we
> should invest in a pure C O(N log W) approach using the skiplist?

Seems like there is a fair amount of interest in a moving median of
numpy arrays.

Due to my ignorance of the relative merits of each algorithm, could we
wrap the C code of all three versions in cython and see which is
fastest? Two algorithms are already coded in C. Making a cython
function that only works on 1d, float64 input might serve as a good
test bed.

I can set up a temp github repo for a moving window median prototype
if anyone is interested in working on it together.

I've never cython wrapped C code. If I had, I'd already have a moving
window median.



More information about the SciPy-User mailing list