Large Dictionaries

Jim Segrave jes at nl.demon.net
Mon Jun 5 16:15:27 EDT 2006


In article <mailman.6567.1149531885.27775.python-list at python.org>,
Tim Peters <tim.peters at gmail.com> wrote:
>[Jim Segrave]
>> Actually, presorted lists are not a bad case for heapsort - it's quite
>> immune to any existing order or lack thereof,
>
>Write a heapsort and time it.  It's not a difference in O() behavior,
>but more memory movement is required for a sorted list because
>transforming the list into a max-heap at the start requires shuffling
>the largest elements from the end of the list to its start.  Initially
>reverse-sorted is a "good case" for heapsort in the same sense
>(because the list is already a max-heap then; initially sorted is as
>far from being a max-heap as is possible).  "good" and "bad" are both
>O(N log N) here, though.

I did implement a crude heapsort before posting this and measured the
number of comparisions and the number of swaps.

==================================================

#!/usr/local/bin/python

import random

class HeapSorter(object):
    def __init__(self, seq):
        self.compares = 0
        self.swaps = 0
        self.length = len(seq)
        self.seq = seq

    def __sift(self, i):
        while True:
            j = 2 * i + 1
            if j + 1 < self.length:
                self.compares += 1
                if self.seq[j + 1] > self.seq[j]:
                    j = j + 1
            if j >= self.length:
                return
        
            self.compares += 1
            if self.seq[i] > self.seq[j]:
                return
        
            self.swaps += 1
            self.seq[i], self.seq[j] = (self.seq[j], self.seq[i])
            i = j

    def __makeheap(self):
        for i in range(self.length / 2, -1, -1):
            self.__sift(i)

    def sort(self):
        self.__makeheap()
        for i in range(self.length - 1, -1, -1):
            self.swaps += 1
            self.seq[i], self.seq[0] = (self.seq[0], self.seq[i])
            self.length -= 1
            self.__sift(0)

if __name__ == '__main__':

    s = list(range(100000))
    random.shuffle(s)
    
    heap = HeapSorter(s)
    heap.sort()
    print "Random list:         comapres %d, swaps %d" % \
          (heap.compares, heap.swaps)
    heap = HeapSorter(s)
    heap.sort()
    print "Sorted list:         comapres %d, swaps %d" % \
          (heap.compares, heap.swaps)

    s.reverse()
    heap = HeapSorter(s)
    heap.sort()
    print "Reverse Sorted list: comapres %d, swaps %d" % \
          (heap.compares, heap.swaps)

==================================================

Which gave the following results:

/usr/home/jes% python ./heapsort
Random list:         comapres 3019969, swaps 1575263
Sorted list:         comapres 3112517, swaps 1650855
Reverse Sorted list: comapres 2926640, swaps 1497435

Assuming compares and swaps dominate other bookkeeping, then heapsort
is quite stable in its performance, although the constant in the O(N log N)
is much larger than for other sorts. 



-- 
Jim Segrave           (jes at jes-2.demon.nl)




More information about the Python-list mailing list