[Patches] [ python-Patches-969791 ] Add nlargest() and nsmallest() to heapq.

SourceForge.net noreply at sourceforge.net
Thu Jun 10 00:53:27 EDT 2004


Patches item #969791, was opened at 2004-06-09 12:59
Message generated for change (Comment added) made by rhettinger
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=969791&group_id=5470

Category: Library (Lib)
Group: Python 2.4
>Status: Closed
Resolution: None
Priority: 5
Submitted By: Raymond Hettinger (rhettinger)
Assigned to: Tim Peters (tim_one)
Summary: Add nlargest() and nsmallest() to heapq.

Initial Comment:
This patch adds a function that encapsulates a principal 
use case for heaps, finding the n largest from a dataset 
without doing a full sort.

----------------------------------------------------------------------

>Comment By: Raymond Hettinger (rhettinger)
Date: 2004-06-09 23:53

Message:
Logged In: YES 
user_id=80475

Taking this one back off Tim's busy plate.

If I deluded myself with the attached tests, he can give me a 
hard time later ;-)

----------------------------------------------------------------------

Comment By: Raymond Hettinger (rhettinger)
Date: 2004-06-09 18:41

Message:
Logged In: YES 
user_id=80475

For many values of n, nsmallest() still makes substantially
fewer comparisons than sort().  Attaching comparison count
test suite.

Here are the comparison counts for a list with a 1000 random
entries and various values of n:

8649 sort

999 1 nlargest
1667 1 nsmallest

1088 5 nlargest
1709 5 nsmallest

1238 10 nlargest
1759 10 nsmallest

1487 20 nlargest
1860 20 nsmallest

2052 40 nlargest
2066 40 nsmallest

2953 80 nlargest
2466 80 nsmallest

4516 160 nlargest
3262 160 nsmallest

6835 320 nlargest
4811 320 nsmallest

9177 640 nlargest
7736 640 nsmallest

9742 1000 nlargest
10315 1000 nsmallest


P.S.  Using itertools, the nsmallest() can be made to run at
C speed:

def nsmallest(iterable, n):
    """Find the n smallest elements in a dataset.

    Equivalent to:  sorted(iterable)[:n]
    """
    h = list(iterable)
    heapify(h)
    return map(heappop, repeat(h, min(n, len(h))))

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2004-06-09 15:06

Message:
Logged In: YES 
user_id=31435

+1 on nlargest().

-0 on nsmalles(), because it has radically different time and 
space requirements than nlargest() for the typical case (n is 
much less than the total # of elements).  If the user can 
afford all that space, it will probably be faster for the user to 
sort a materialized list then peel off the first n.  That's a 
simple one-liner.  If so, that makes nsmallest an "attractive 
nuisance".  If heapq had a more abstract interface, so that 
we had both min-heaps and max-heaps, this objection would 
of course go away.  But we don't, so ...

----------------------------------------------------------------------

Comment By: Raymond Hettinger (rhettinger)
Date: 2004-06-09 14:00

Message:
Logged In: YES 
user_id=80475

Revised the patch to also include nsmallest().

Note, the API intentionally does not provide default values of 
n.  This is to encourage use of min() and sorted() for corner 
cases.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=969791&group_id=5470



More information about the Patches mailing list