Nasty gotcha/bug in heapq.nlargest/nsmallest
George Sakkis
george.sakkis at gmail.com
Wed May 14 22:47:56 EDT 2008
I spent several hours debugging some bogus data results that turned
out to be caused by the fact that heapq.nlargest doesn't respect rich
comparisons:
import heapq
import random
class X(object):
def __init__(self, x): self.x=x
def __repr__(self): return 'X(%s)' % self.x
if True:
# this succeeds
def __cmp__(self, other): return cmp(self.x , other.x)
else:
# this fails
def __lt__(self, other): return self.x < other.x
s = [X(i) for i in range(10)]
random.shuffle(s)
s1 = heapq.nlargest(5, s)
s2 = sorted(s, reverse=True)[:5]
assert s1 == s2, (s,s1,s2)
s1 = heapq.nsmallest(5, s)
s2 = sorted(s)[:5]
assert s1 == s2, (s,s1,s2)
According to the docs, nlargest is equivalent to: "sorted(iterable,
key=key, reverse=True)[:n]" and similarly for nsmallest. So that must
be at least a documentation bug, if not an implementation one.
George
More information about the Python-list
mailing list