Some optimization tale
Tim Peters
tim.one at comcast.net
Sat Dec 27 13:50:44 EST 2003
[Stephan Diehl]
> ...
> Michael Dyck then pointed out that instead of using 'sort', 'min' and
> 'max' should be used. While tests suggest that this is true, I have
> no idea why that should be, since finding a minimum or maximum uses
> some sorting anyway (if we don't have some quantum computer at our
> hands), so, my reasoning would be that sorting once should be faster
> than computing both, maximum and minimum.
Finding a minimal element in a list of length N requires at least N-1
comparisons (you simply can't know its "the smallest" unless it's compared
at least once against each of the N-1 other elements), and can also be done
using no more than N-1 comparisons. This is really just the obvious way:
def min(sequence):
smallest_so_far = sequence[0]
for candidate in sequence[1:]:
if candidate < smallest_so_far:
smallest_so_far = candidate
return smallest_so_far
Finding a maximal element also requires exactly N-1 comparisons.
Finding a min and a max in separate steps then requires exactly 2*(N-1)
comparisons, or roughly 2*N. If you can look for a min and a max in the
same step, it's possible to reduce that. For example, once things get
started, take sequence elements two at a time. Compare them to each other,
then compare the smaller only to the smallest so far, and the larger only to
the largest so far. This requires 3 compares overall per pair of input
elements, so takes about 1.5*N total compares to find a min and a max.
Sorting (with a theoretically best-possible comparison-based algorithm)
requires an average of log_base_2(N!) comparisons, across randomly ordered
sequences of length N. That's (very) roughly N * log_base_2(N) comparisons.
log_base_2(N) is bigger than 2 most of the time <wink>, and that's why doing
min() and max() can beat the pants off of sorting.
More information about the Python-list
mailing list