efficient running median

Steven D'Aprano steven at REMOVE.THIS.cybersource.com.au
Tue Oct 13 18:03:49 EDT 2009


On Tue, 13 Oct 2009 12:59:47 -0700, Paul Rubin wrote:

> sturlamolden <sturlamolden at yahoo.no> writes:
>> > The obvious way to compute a running median involves a tree structure
>> > so you can quickly insert and delete elements, and find the median.
>> > That would be asymtotically O(n log n) but messy to implement.
>> 
>> QuickSelect will find the median in O(log n) time.
> 
> That makes no sense, you can't even examine the input in O(log n) time.

If that were a relevant argument, no algorithms would ever be described 
as running in O(log n).

Obviously to run in O(log n) you must have already built the tree. If you 
include the time required to build the tree, then O(n) is the lower-bound 
(because you have to see each element to put it in the tree) but that 
applies to any data structure. The point is, once you have that tree, you 
can perform repeated calculations in O(log n) time (or so it has been 
claimed).

For example, if you want the m running medians of m data points, you 
might do the following:

find the median of element 1
find the median of element 1 and 2
find the median of element 1 through 3
find the median of element 1 through 4
...
find the median of element 1 through m

Each median calculation may take O(log n), where n varies from 1 to m, 
giving a total order of:

log 1 + log 2 + log 3 + ... + log m
=> O(log m!)

steps, which is much better than:

1 + 2 + 3 + ... + m = 1/2*m*(m+1)
=> O(m**2)


> Anyway, the problem isn't to find the median of n elements.  It's to
> find n different medians of n different sets.  You might be able to get
> close to linear time though, depending on how the data acts.

But since the data sets aren't independent, a tree structure seems like 
the way to go to me. Inserting and deleting elements should be fast, and 
assuming the claim of calculating the median in O(log n) is correct, 
that's quite fast too.




-- 
Steven



More information about the Python-list mailing list