Help understanding the decisions *behind* python?

Duncan Booth duncan.booth at invalid.invalid
Wed Jul 22 07:55:13 EDT 2009


Steven D'Aprano <steven at REMOVE.THIS.cybersource.com.au> wrote:

> But that's the wrong solution to the problem. The OP wants the largest
> (or smallest) item, which he expects to get by sorting, then grabbing
> the first element:
> 
> sorted(alist)[0]
> 
> That requires sorting the entire list, only to throw away everything 
> except the first item. A better solution is to use min() and max(), 
> neither of which sort the list, so they are much faster.
> 
And if they want the n smallest or largest items (where n is significantly 
less than the length of the list[*]) they might consider using 
heapq.nsmallest() and heapq.nlargest()

I find it interesting that the heapq functions tell you in the 
documentation that they aren't suitable for use where n==1 or where n is 
near the total size of the sequence whereas random.sample() chooses what it 
thinks is the best algorithm based on the input. At the very least I would 
have thought the heapq functions could check for n==1 and call min/max when 
it is.

[*] Some quick playing with timeit seems to indicate that with a list of 
200 integers nsmallest is fastest (by a very small margin) when n==2 and 
n==3 but is beaten by sorted[:n] when n==4, so I guess you need a 
reasonably long list to make it worth not sorting it: with list of 20,000 
integers it isn't worth sorting unless you want more than about 700 items.

-- 
Duncan Booth http://kupuguy.blogspot.com



More information about the Python-list mailing list