[Patches] [ python-Patches-969791 ] Add nlargest() and nsmallest()
to heapq.
SourceForge.net
noreply at sourceforge.net
Thu Jun 10 00:53:27 EDT 2004
Patches item #969791, was opened at 2004-06-09 12:59
Message generated for change (Comment added) made by rhettinger
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=969791&group_id=5470
Category: Library (Lib)
Group: Python 2.4
>Status: Closed
Resolution: None
Priority: 5
Submitted By: Raymond Hettinger (rhettinger)
Assigned to: Tim Peters (tim_one)
Summary: Add nlargest() and nsmallest() to heapq.
Initial Comment:
This patch adds a function that encapsulates a principal
use case for heaps, finding the n largest from a dataset
without doing a full sort.
----------------------------------------------------------------------
>Comment By: Raymond Hettinger (rhettinger)
Date: 2004-06-09 23:53
Message:
Logged In: YES
user_id=80475
Taking this one back off Tim's busy plate.
If I deluded myself with the attached tests, he can give me a
hard time later ;-)
----------------------------------------------------------------------
Comment By: Raymond Hettinger (rhettinger)
Date: 2004-06-09 18:41
Message:
Logged In: YES
user_id=80475
For many values of n, nsmallest() still makes substantially
fewer comparisons than sort(). Attaching comparison count
test suite.
Here are the comparison counts for a list with a 1000 random
entries and various values of n:
8649 sort
999 1 nlargest
1667 1 nsmallest
1088 5 nlargest
1709 5 nsmallest
1238 10 nlargest
1759 10 nsmallest
1487 20 nlargest
1860 20 nsmallest
2052 40 nlargest
2066 40 nsmallest
2953 80 nlargest
2466 80 nsmallest
4516 160 nlargest
3262 160 nsmallest
6835 320 nlargest
4811 320 nsmallest
9177 640 nlargest
7736 640 nsmallest
9742 1000 nlargest
10315 1000 nsmallest
P.S. Using itertools, the nsmallest() can be made to run at
C speed:
def nsmallest(iterable, n):
"""Find the n smallest elements in a dataset.
Equivalent to: sorted(iterable)[:n]
"""
h = list(iterable)
heapify(h)
return map(heappop, repeat(h, min(n, len(h))))
----------------------------------------------------------------------
Comment By: Tim Peters (tim_one)
Date: 2004-06-09 15:06
Message:
Logged In: YES
user_id=31435
+1 on nlargest().
-0 on nsmalles(), because it has radically different time and
space requirements than nlargest() for the typical case (n is
much less than the total # of elements). If the user can
afford all that space, it will probably be faster for the user to
sort a materialized list then peel off the first n. That's a
simple one-liner. If so, that makes nsmallest an "attractive
nuisance". If heapq had a more abstract interface, so that
we had both min-heaps and max-heaps, this objection would
of course go away. But we don't, so ...
----------------------------------------------------------------------
Comment By: Raymond Hettinger (rhettinger)
Date: 2004-06-09 14:00
Message:
Logged In: YES
user_id=80475
Revised the patch to also include nsmallest().
Note, the API intentionally does not provide default values of
n. This is to encourage use of min() and sorted() for corner
cases.
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=969791&group_id=5470
More information about the Patches
mailing list