Radix sort algorithm in Python question

Jeff Petkau jpet at eskimo.com
Thu Feb 8 01:49:17 EST 2001


Ray Van Dolson <rayvd at nospam.firetail.org> wrote in message
news:g_5g6.12851$1%2.627095 at sjc-read.news.verio.net...
> I'm designing a Radix sort algorithm in Python (for an assignment).  I
have
> the algorithm done, but it's not exactly fast.  Since I'm supposed to be
> comparing this to Quicksort (my Python Quicksort implementation is -
> infinitely- faster).  Here's the code:
>
> def rSort(a,n,maxLen):
>     bins=[[],[],[],[],[],[],[],[],[],[],[]]
>     for x in range(0,maxLen):
>         if x == 0:
>             for y in a:
>                 bins[y%n].append(y)
>         else:
>             for y in a:
>                 origValue=y
>                 if y < 10**x:
>                     bins[0].append(y)
>                 else:
>                     y=y/(10**x)
>                     bins[y%n].append(origValue)
>         del a
>         a=[]
>         for section in bins:
>             if len(section) > 0:
>                 for times in range(0,len(section)):
>                     a.append(section.pop(0))
>
>
> I'm guessing the problem lies when I am taking the elements out of the
> "bins" and putting them back into the array.  I delete the array and then
> step through the bins popping the numbers out.  When we're talking about
> thousands of elements this takes a while.
>
> Anyone have any suggestions on how I could better streamline this?  I
don't
> think radix should be 40 times slower than quicksort, but I could be
wrong.
>
> Thanks,
> Ray Van Dolson

First the minor stuff: all of the complicated if/else logic
in the first loop is unnecessary--you end up doing the same
thing in all cases. So you can change it to:

    for x in range(maxLen):
        bins[(y/10**x)%n].append(y)

But the real problem is the 'section.pop(0)' call. Lists in
Python are stored as linear arrays, and pop(0) removes the
*first* element from the array, copying all the other elements
down one position. Since the size of each bin is proportional
to the size of the input array, this changes the algorithm
from O(N) to O(N^2)!

Instead, you could use 'a.extend(section)' to add the
contents of section to a. This doesn't empty out
section, so you have to do that (say with 'del section[:]')
or just recreate the bins array each time through the
loop.

So the faster version is:

def rSort2(a,n,maxLen):
    for x in range(maxLen):
        bins = [[] for i in range(n)]
        for y in a:
            bins[(y/10**x)%n].append(y)
        a=[]
        for section in bins:
            a.extend(section)
    return a

On a 100,000 element list, rSort(x,10,6) takes about 59 seconds
on my machine; rSort2() takes just under 5 seconds.

On a 1,000,000 element list, rSort2(x,10,6) takes 54 seconds;
I expect rSort() to take about an hour and a half, but I'm
tired of waiting for it so I'll just send this message now.

--Jeff






More information about the Python-list mailing list