stable algorithm with complexity O(n)

Mon Dec 15 03:54:15 EST 2008

On Sun, 14 Dec 2008 21:12:13 -0800, Aaron Brady wrote:

> On Dec 14, 8:18 pm, Roy Smith <r... at panix.com> wrote:
>> Steven D'Aprano <st... at REMOVE-THIS-cybersource.com.au> wrote:
>> > All the positive thinking in the world won't help you:
>>
>> > * make a four-sided triangle;
>>
>> > * split a magnet into two individual poles;
>>
>> These two are fundamentally different problems.
>>
>> The first is impossible by definition.  The definition of triangle is,
>> "a three-sided polygon".  Asking for a "four-sided triangle" is akin to
>> asking for "a value of three which is equal to four".
>>
>> The second is only "impossible" because it contradicts our
>> understanding (based on observation) of how the physical universe
>> works.  Our understanding could simply be wrong.  We've certainly been
>> wrong before, and we will undoubtedly be proven wrong again in the
>> future.  When it comes to things like electromagnetic theory, it
>> doesn't take too many steps to get us to the fuzzy edge of quantum
>> physics where we know there are huge questions yet to be answered.
> 
> I agree.  Most of his examples were tautologies.  The magnet one was the
> exception.
> 
> Then, to beat the O( n lg n ) limit, just break an assumption.

No no no, that's solving a different problem! You're welcome to solve a 
different problem if you like, and sometimes doing so is very valuable: 
sometimes the original problem turns out not to be important after all. 
But don't pretend that it's the same problem.

>> > * or design a comparison sort which does fewer than O(n*log n)
>> > two-way comparisons in the worst case, or fewer than O(n) comparisons
>> > in the best case.
> 
> Make a three-way comparison, make a non-comparison sort, or make non-
> random inputs.

Again no. Multi-way comparisons merely change the base of the logarithm, 
which in turn merely changes the constant term. It's still O(n*log n). 
Non-comparison sorts are a useful technique, but it's changing the 
problem, and they are only useful in very limited circumstances. There's 
a good reason that most sort routines are based on O(n*log n) comparison 
sorts instead of O(n) bucket sorts or radix sorts.

And the quip about non-random inputs just shows you don't understand the 
problem you're being asked to solve. O(n*log n) is for the least number 
of comparisons needed for the worst possible input. It's not for the best 
possible input, or random input, or the average number of comparisons, or 
any other measurement. You've misunderstood the question if you think 
"give better input" is the answer.

And frankly, it's *easy* to come up with a comparison sort that has O(n) 
behaviour for the best possible input. Bubble sort has O(n) behaviour for 
input that is already sorted. Even the infamous bogosort has O(n) 
behaviour for the best possible input:

def is_sorted(alist):
    """Return True if alist is sorted, otherwise False."""
    for i in xrange(1, len(alist)):
        if alist[i-1] > alist[i]:
            return False
    return True

def bogosort(alist):
    # Shuffle alist until it happens to be sorted
    while not is_sorted(alist):
        random.shuffle(alist)

I believe bogosort has behaviour O(n!), which is far worse than merely 
exponential O(2**n) complexity. But when given already sorted input, the 
version of bogosort above is O(n). Even the archtypically Awful (not just 
Bad) algorithm is amazingly fast if you happen to give it the best 
possible input.

(By the way, the above code is not suitable for use in production code. 
Due to limitations of the standard Python random number generator, not 
all permutations of lists larger than a certain size can be generated. 
The default random number generator for Python 2.5 is the Mersenne 
Twister, which has a period of 2**19937-1. But the number of permutations 
of n items is n!, and n! is greater than 2**19937-1 for n=2081 or more. 
That means that for any list with 2081 or more items, there is a finite 
probability that the above implementation of bogosort will merely cycle 
repeatedly over 2**19937-1 unsorted permutations and hence never 
terminate.)

-- 
Steven