[Python-Dev] Re: heaps
David Eppstein
eppstein@ics.uci.edu
Sat, 03 May 2003 22:46:58 -0700
On 5/4/03 1:26 AM -0400 Tim Peters <tim.one@comcast.net> wrote:
> it's normal in N-best applications that N is much smaller than the number
> of items being ranked, and you don't want to consume more than O(N)
> memory (for example, google wants to show you the best-scoring 25
> documents of the 6 million matches it found):
Ok, I think you're right, for this sort of thing heapq is better. One could
extend my priorityDictionary code to limit memory like this but it would be
unnecessary work when the extra features it has over heapq are not used for
this sort of algorithm.
On the other hand, if you really want to find the n best items in a data
stream large enough that you care about using only space O(n), it might
also be preferable to take constant amortized time per item rather than the
O(log n) that heapq would use, and it's not very difficult nor does it
require any fancy data structures. Some time back I needed some Java code
for this, haven't had an excuse to port it to Python. In case anyone's
interested, it's online at
<http://www.ics.uci.edu/~eppstein/161/KBest.java>. Looking at it now, it
seems more complicated than it needs to be, but maybe that's just the
effect of writing in Java instead of Python (I've seen an example of a
three-page Java implementation of an algorithm in a textbook that could
easily be done in a dozen Python lines).
--
David Eppstein http://www.ics.uci.edu/~eppstein/
Univ. of California, Irvine, School of Information & Computer Science