Selecting k smallest or largest elements from a large list in python; (benchmarking)

Terry Reedy tjreedy at udel.edu
Thu Sep 2 18:14:42 EDT 2010


On 9/1/2010 9:08 PM, Dmitry Chichkov wrote:
> Given: a large list (10,000,000) of floating point numbers;
> Task: fastest python code that finds k (small, e.g. 10) smallest
> items, preferably with item indexes;
> Limitations: in python, using only standard libraries (numpy&  scipy
> is Ok);
>
> I've tried several methods. With N = 10,000,000, K = 10 The fastest so
> far (without item indexes) was pure python implementation
> nsmallest_slott_bisect (using bisect/insert). And with indexes
> nargsmallest_numpy_argmin (argmin() in the numpy array k times).
>
> Anyone up to the challenge beating my code with some clever selection
> algorithm?
>
> Current Table:
> 1.66864395142 mins_heapq(items, n):
> 0.946580886841 nsmallest_slott_bisect(items, n):
> 1.38014793396 nargsmallest(items, n):
> 10.0732769966 sorted(items)[:n]:
> 3.17916202545 nargsmallest_numpy_argsort(items, n):
> 1.31794500351 nargsmallest_numpy_argmin(items, n):
> 2.37499308586 nargsmallest_numpy_array_argsort(items, n):
> 0.524670124054 nargsmallest_numpy_array_argmin(items, n):
>
> 0.0525538921356 numpy argmin(items): 1892997
> 0.364673852921 min(items): 10.0000026786

Your problem is underspecified;-).
Detailed timing comparisons are only valid for a particular Python 
version running under a particular OS on particular hardware. So, to 
actually run a contest, you would have to specify a version and OS and 
offer to run entries on your machine, with as much else as possible 
turned off, or else enlist a neutral judge to do so. And the timing 
method should also be specified.

-- 
Terry Jan Reedy




More information about the Python-list mailing list