Some optimization tale

Sat Dec 27 13:50:44 EST 2003

[Stephan Diehl]
> ...
> Michael Dyck then pointed out that instead of using 'sort', 'min' and
> 'max' should be used. While tests suggest that this is true, I have
> no idea why that should be, since finding a minimum or maximum uses
> some sorting anyway (if we don't have some quantum computer at our
> hands), so, my reasoning would be that sorting once should be faster
> than computing both, maximum and minimum.

Finding a minimal element in a list of length N requires at least N-1
comparisons (you simply can't know its "the smallest" unless it's compared
at least once against each of the N-1 other elements), and can also be done
using no more than N-1 comparisons.  This is really just the obvious way:

    def min(sequence):
        smallest_so_far = sequence[0]
        for candidate in sequence[1:]:
            if candidate < smallest_so_far:
                smallest_so_far = candidate
        return smallest_so_far

Finding a maximal element also requires exactly N-1 comparisons.

Finding a min and a max in separate steps then requires exactly 2*(N-1)
comparisons, or roughly 2*N.  If you can look for a min and a max in the
same step, it's possible to reduce that.  For example, once things get
started, take sequence elements two at a time.  Compare them to each other,
then compare the smaller only to the smallest so far, and the larger only to
the largest so far.  This requires 3 compares overall per pair of input
elements, so takes about 1.5*N total compares to find a min and a max.

Sorting (with a theoretically best-possible comparison-based algorithm)
requires an average of log_base_2(N!) comparisons, across randomly ordered
sequences of length N.  That's (very) roughly N * log_base_2(N) comparisons.
log_base_2(N) is bigger than 2 most of the time <wink>, and that's why doing
min() and max() can beat the pants off of sorting.