How to demonstrate bigO cost of algorithms?

Wed Jun 2 12:18:32 EDT 2004

Rusty Shackleford <rs at overlook.homelinux.net> wrote in 
news:slrncbrudt.995.rs at frank.overlook.homelinux.net:

> Here's my clumsy attempt to implement this:
> 
> def qsort(ll):
>     try:
>         global n
>         n += len(ll)
>         print "adding %d to %d." % (len(ll), n)
>         print "incrementing up n to %d." % n
>     except:
>         pass
>     if len(ll) == 0:
>         return []
>     p = ll[0]
>     left = [x for x in ll if x < p]
>     mid = [x for x in ll if x == p]
>     right = [x for x in ll if x > p]
>     return qsort(left) + mid + qsort(right)
> 
> I'm not getting the results I hoped for.  I expected that after the
> program is finished, I would get results near 
> 
>     len(ll) * math.log(len(ll), 2)
> 
> since the big O for quicksort is nlogn for all but the worst case.

You don't say what sort of objects you are passing in the list ll. Nor do 
you say how large a value of 'n' you tried. Nor have you said how you 
ensured that you weren't hitting the worst case performance.

Python may not be the best language to demonstrate this sort of effect. Two 
things affect sorting speed in most languages: the number of comparisons, 
and the number of times you copy data items, but other factors can come 
into play when they are slow compared to the comparisons and moves. It may 
well be that here your comparisons are comparatively fast (especially if 
you just used integers as the input data), and the time is being swamped by 
the function call overhead and creating all those intermediate lists. In 
that case you may just need to increase the length of the list. A lot.

Most quicksort implementations sort the data in-place. You are creating new 
lists for left, mid, right, and then again creating new lists by 
concatenating the results of your recursive calls. Both of these involve 
memory allocation, and the lists may need to be moved around in memory as 
they grow. All of this complicates the order calculation.

Also, remember that if your lists are too long you are likely to start 
swapping virtual memory, so all your times go out the window at that point. 
Not to mention garbage collection.

If you want to get a result that matches your expectations, then I think 
you would be best to operate on the data inplace, and probably also get rid 
of the recursion.