Comparing lists

Sun Oct 16 21:36:34 EDT 2005

Steven D'Aprano wrote:
> On Sat, 15 Oct 2005 18:17:36 +0200, Christian Stapfer wrote:
> 
> 
>>>>I'd prefer a (however) rough characterization
>>>>of computational complexity in terms of Big-Oh
>>>>(or Big-whatever) *anytime* to marketing-type
>>>>characterizations like this one...
>>>
>>>Oh how naive.
>>
>>Why is it that even computer science undergrads
>>are required to learn the basics of Big-Oh and
>>all that? 
> 
> 
> So that they know how to correctly interpret what Big O notation means,
> instead of misinterpreting it. Big O notation doesn't tell you everything
> you need to know to predict the behaviour of an algorithm. It doesn't even
> tell you most of what you need to know about its behaviour. Only actual
> *measurement* will tell you what you need to know.

In my experience, I need both knowledge of algorithmic
complexity (in some pragmatic sense) and measurements.
Neither alone is sufficient.

The proponents of algorithmic complexity measures don't
make the mistake of thinking that constants don't matter
for real-world performance, but they also don't make the
mistake of thinking that you can always measure enough
to tell you how your code will perform in all situations
in which it might be used.

Measurement is complicated -- very often, it just shows
you that tuning to match cache sizes is greatly important
to keep the constant factors down.  And yes, for small
data sizes often a linear-time algorithm can beat one
whose execution time grows only logarithmically, while
often a logarithmic time is close enough to constant over
the range of interest.  How an operation runs on a heavily
loaded system where it shares resources with other tasks
can also be greatly different from what microbenchmarks
might suggest.

If we don't oversimplify, we'll measure some appropriate
performance numbers and combine that with some knowledge
of the effects of caches, algorithmic complexity and other
factors that might matter in given situations.  And of
course there will be many situations where programmer time
and simplicity are more important than saving a millisecond,
or even a second, and we won't waste excessive resources in
optimising runtime at the expense of other factors.

-- James