[issue5669] Extra heapq nlargest/nsmallest option for including ties

George Sakkis report at bugs.python.org
Thu Apr 2 21:01:26 CEST 2009


George Sakkis <george.sakkis at gmail.com> added the comment:

> In that case, I think it best to leave nsmallest/nlargest as-is.   By
> appending ties to the result, it becomes harder to implement policy
> decisions on what to do with those ties (perhaps listing them separately
> or splitting their prizes n-ways).  

I wouldn't worry about the further post-processing too much; it's
optional and up to the client to implement if necessary (in my current
use case it's not). Regardless, that's orthogonal to having a general,
reliable and efficient function in the standard library that reduces an
initial sequence of thousands/millions down to a dozen or two,
preserving the ties.

> A preferred approach is to use
> existing tools such as:  last=result[1]; [last]*s.count(last).
> Alternatively, result=sorted(s) works well with bisect to find the cut
> points.  Also, my instincts say that it is weird ask for n-items and get
> a result with potentially more than n.

That's a documentation issue, it would be explicitly addressed in the
description of `ties`. I'm not proposing to change the current default
behavior; the whole change would be fully backwards compatible.

> Looking at the source code for nsmallest/nlargest, I think this
> build-out would complicate the code quite a bit, so the use cases would
> need to be more compelling.  The way the nsmallest/nlargest work is that
> they maintain an n-length buffer and loop over the entire input while
> remembering only the best-n-so-far.  That approach doesn't easily extend
> to tracking best-n-so-far-and-all-ties.

I haven't thought about the implementation yet but before I do, would a
patch help reopen this ticket ? If so, I'm willing to take a stab at it.

----------
status: closed -> open

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue5669>
_______________________________________


More information about the Python-bugs-list mailing list