[Python-Dev] Re: heaps

David Eppstein eppstein@ics.uci.edu
Sun, 04 May 2003 00:54:21 -0700


In article <17342817.1052002018@[10.0.1.2]>,
 David Eppstein <eppstein@ics.uci.edu> wrote:

> On the other hand, if you really want to find the n best items in a data 
> stream large enough that you care about using only space O(n), it might 
> also be preferable to take constant amortized time per item rather than the 
> O(log n) that heapq would use, and it's not very difficult nor does it 
> require any fancy data structures.  Some time back I needed some Java code 
> for this, haven't had an excuse to port it to Python.  In case anyone's 
> interested, it's online at 
> <http://www.ics.uci.edu/~eppstein/161/KBest.java>.

BTW, the central idea here is to use a random quicksort pivot to shrink 
the list, when it grows too large.

In python, this could be done without randomization as simply as

def addToNBest(L,x,N):
    L.append(x)
    if len(L) > 2*N:
        L.sort()
        del L[N:]

It's not constant amortized time due to the sort, but that's probably 
more than made up for due to the speed of compiled sort versus 
interpreted randomized pivot.

-- 
David Eppstein                      http://www.ics.uci.edu/~eppstein/
Univ. of California, Irvine, School of Information & Computer Science