Comparing lists

Mon Oct 17 04:35:33 EDT 2005

Christian Stapfer <nil at dev.nul> wrote:

>  This is why we would like to have a way of (roughly)
> estimating the reasonableness of the outlines of a
> program's design in "armchair fashion" - i.e. without
> having to write any code and/or test harness.

And we would also like to consume vast amounts of chocolate, while
similarly reclining in comfortable armchairs, without getting all fat
and flabby.  Unfortunately, what we would like and what reality affords
are often pretty uncorrelated.  No matter how much theoreticians may
love big-O because it's (relatively) easy to compute, it still has two
failings which are often sufficient to rule out its sufficiency for any
"estimate [of] the reasonableness" of anything: [a] as we operate on
finite machines with finite wordsize, we may never be able reach
anywhere even remotely close to the "asymptotic" region where big-O has
some relationship to reality; [b] in many important cases, the
theoretical worst-case is almost impossible to characterize and hardly
ever reached in real life, so big-O is of no earthly use (and much
harder to compute measures such as big-Theta should be used for just
about any practical purpose).

Consider, for example, point [b].  Quicksort's big-O is N squared,
suggesting that quicksort's no better than bubblesort or the like.  But
such a characterization is absurd.  A very naive Quicksort, picking its
pivot very systematically (e.g., always the first item), may hit its
worst case just as systematically and in cases of practical importance
(e.g., already-sorted data); but it takes just a little extra care (in
the pivot picking and a few side issues) to make the worst-case
occurrences into ones that will not occur in practice except when the
input data has been deliberately designed to damage by a clever and
determined adversary.

Designing based on worst-case occurrences hardly ever makes sense in any
field of engineering, and blind adherence to worst-case assessments can
be an unmitigated disaster, promoting inferior technology just because,
in the WORST imaginable case, the best available technology would fare
no better than the inferior one (even though in 99.99999% of cases the
best technology would perform better, if you're designing based on
worst-case analyses you may not even NOTICE that -- and NEVER, *NEVER*
forget that big-O is nothing BUT "extreme-worst-case" analysis!).  Why
bother using prestressed concrete, when, should a large asteroid score a
direct hit, the costly concrete will stand up no better than cheap
bricks, or, for that matter, slightly-damp straw?  Why bother doing
(e.g.) random pivot selection in quicksort, when its big-O (i.e.,
worst-case) behavior will remain N-squared, just like naive quicksort,
or, for that matter, bubblesort?

Alex