[Python-Dev] Re: heaps

David Eppstein eppstein@ics.uci.edu
Sat, 03 May 2003 22:46:58 -0700


On 5/4/03 1:26 AM -0400 Tim Peters <tim.one@comcast.net> wrote:
> it's normal in N-best applications that N is much smaller than the number
> of items being ranked, and you don't want to consume more than O(N)
> memory (for example, google wants to show you the best-scoring 25
> documents of the 6 million matches it found):

Ok, I think you're right, for this sort of thing heapq is better. One could 
extend my priorityDictionary code to limit memory like this but it would be 
unnecessary work when the extra features it has over heapq are not used for 
this sort of algorithm.

On the other hand, if you really want to find the n best items in a data 
stream large enough that you care about using only space O(n), it might 
also be preferable to take constant amortized time per item rather than the 
O(log n) that heapq would use, and it's not very difficult nor does it 
require any fancy data structures.  Some time back I needed some Java code 
for this, haven't had an excuse to port it to Python.  In case anyone's 
interested, it's online at 
<http://www.ics.uci.edu/~eppstein/161/KBest.java>.  Looking at it now, it 
seems more complicated than it needs to be, but maybe that's just the 
effect of writing in Java instead of Python (I've seen an example of a 
three-page Java implementation of an algorithm in a textbook that could 
easily be done in a dozen Python lines).
-- 
David Eppstein                      http://www.ics.uci.edu/~eppstein/
Univ. of California, Irvine, School of Information & Computer Science