Nasty gotcha/bug in heapq.nlargest/nsmallest
Gabriel Genellina
gagsl-py2 at yahoo.com.ar
Thu May 15 02:44:42 EDT 2008
En Wed, 14 May 2008 23:47:56 -0300, George Sakkis <george.sakkis at gmail.com> escribió:
> I spent several hours debugging some bogus data results that turned
> out to be caused by the fact that heapq.nlargest doesn't respect rich
> comparisons:
>
> import heapq
> import random
>
> class X(object):
> def __init__(self, x): self.x=x
> def __repr__(self): return 'X(%s)' % self.x
> if True:
> # this succeeds
> def __cmp__(self, other): return cmp(self.x , other.x)
> else:
> # this fails
> def __lt__(self, other): return self.x < other.x
>
> s = [X(i) for i in range(10)]
> random.shuffle(s)
>
> s1 = heapq.nlargest(5, s)
> s2 = sorted(s, reverse=True)[:5]
> assert s1 == s2, (s,s1,s2)
>
> s1 = heapq.nsmallest(5, s)
> s2 = sorted(s)[:5]
> assert s1 == s2, (s,s1,s2)
>
>
> According to the docs, nlargest is equivalent to: "sorted(iterable,
> key=key, reverse=True)[:n]" and similarly for nsmallest. So that must
> be at least a documentation bug, if not an implementation one.
Your class is not properly defined. sorted() only uses < to compare items, but this fact is NOT documented as far as I can tell. So you are abusing this implementation detail by not defining the other comparison operators, and the X class has a rather strange behavior:
py> X(1)<X(2)
True
py> X(1)<=X(2)
False # !!!
py> X(1)>X(2)
False
py> X(1)==X(1)
False # !!!
If you write all the six rich comparison methods (__lt__, __le__, etc) nlargest and nsmallest work fine.
If your objects are fully comparable (like numbers; that is, given A and B, one and only one of these holds: A<B, A==B, A>B) the easiest way is to write a helper function (like the old __cmp__) and implement the rich comparison methods using it. Those six methods could be defined in a mixin class.
--
Gabriel Genellina
More information about the Python-list
mailing list