Help understanding the decisions *behind* python?

Raymond Hettinger python at rcn.com
Fri Jul 31 13:32:46 EDT 2009


On Jul 22, 4:55 am, Duncan Booth <duncan.bo... at invalid.invalid> wrote:
> Steven D'Aprano <ste... at REMOVE.THIS.cybersource.com.au> wrote:
> > But that's the wrong solution to the problem. The OP wants the largest
> > (or smallest) item, which he expects to get by sorting, then grabbing
> > the first element:
>
> > sorted(alist)[0]
>
> > That requires sorting the entire list, only to throw away everything
> > except the first item. A better solution is to use min() and max(),
> > neither of which sort the list, so they are much faster.
>
> And if they want the n smallest or largest items (where n is significantly
> less than the length of the list[*]) they might consider using
> heapq.nsmallest() and heapq.nlargest()
>
> I find it interesting that the heapq functions tell you in the
> documentation that they aren't suitable for use where n==1 or where n is
> near the total size of the sequence whereas random.sample() chooses what it
> thinks is the best algorithm based on the input. At the very least I would
> have thought the heapq functions could check for n==1 and call min/max when
> it is.

The heapq.py code in Py2.7 and Py3.1 already does some automatic
algorithm selection:
http://svn.python.org/view/python/branches/release31-maint/Lib/heapq.py?revision=73579&view=markup

The automatic seletion of alternatives only occurs in clear-cut cases
(such as n==1
or where n==len(iterable) when the iterable has a known length).   For
the
remaining cases, the switchover point from heapq to sorted needs a
programmer's judgment based on whether the input iterable has a known
length, the cost of comparison function, and whether input is already
partially ordered.

The advice in the docs helps the reader understand the
relationships between min, max, nsmallest, nlargest, and sorted.


Raymond



More information about the Python-list mailing list