[issue6614] heapq.nsmallest and nlargest should be smarter/more usable/more consistent

Fri Jul 31 21:22:33 CEST 2009

Joshua Bronson <jabronson at gmail.com> added the comment:

> I prefer the docs the way they are.  They help the reader understand
> the relationship between min, max, nsmallest, nlargest, and sorted.

Except that it's no longer true that "when n==1, it is more efficient to use the 
builtin min() and max() functions." Shouldn't this be updated to say "when n==1, 
it is equivalent to using the builtin min() and max() functions"?

> I'm not sure where you got the n * 10 <= len(iterable) switch-over
> point.

It's right in the file you linked to. Search for "n * 10" in 
http://svn.python.org/view/python/branches/release31-maint/Lib/heapq.py?
revision=73579&view=markup.

> That is arbitrary.  The correct switchover point depends on the
> cost of the comparison function, whether the length of the input is
> known, whether the input data is partially ordered, memory constraints,
> whether a key function is used, and on other factors.   

So should that be removed, then?

> FWIW, I also wrote the logic for random.sample().  The switchover logic
> was straight-forward because performance depended on factors that were
> fully known (length of input, sample size, memory utilization, and
> average number of probes for each strategy).
> One other thought:  When memory is tight, the programmer needs to be
> able to select the heap algorithm in favor of sorted() even for
> relatively large values of n.  I do not want an automatic switchover
> point that takes away a programmer's choice between speed and space
> efficiency.

Agreed, and thanks for the additional info.

----------

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue6614>
_______________________________________