Nasty gotcha/bug in heapq.nlargest/nsmallest

George Sakkis george.sakkis at gmail.com
Wed May 14 22:47:56 EDT 2008


I spent several hours debugging some bogus data results that turned
out to be caused by the fact that heapq.nlargest doesn't respect rich
comparisons:

    import heapq
    import random

    class X(object):
        def __init__(self, x): self.x=x
        def __repr__(self): return 'X(%s)' % self.x
        if True:
            # this succeeds
            def __cmp__(self, other): return cmp(self.x , other.x)
        else:
            # this fails
            def __lt__(self, other): return self.x < other.x

    s = [X(i) for i in range(10)]
    random.shuffle(s)

    s1 = heapq.nlargest(5, s)
    s2 = sorted(s, reverse=True)[:5]
    assert s1 == s2, (s,s1,s2)

    s1 = heapq.nsmallest(5, s)
    s2 = sorted(s)[:5]
    assert s1 == s2, (s,s1,s2)


According to the docs, nlargest is equivalent to: "sorted(iterable,
key=key, reverse=True)[:n]" and similarly for nsmallest. So that must
be at least a documentation bug, if not an implementation one.

George



More information about the Python-list mailing list