outline-style sorting algorithm

Thorsten Kampe thorsten at thorstenkampe.de
Thu Apr 29 16:34:02 EDT 2004


* Terry Reedy (2004-04-29 20:03 +0100)
> "Thorsten Kampe" <thorsten at thorstenkampe.de> wrote in message
> news:c2bdjg9l06p2.dlg at thorstenkampe.de...
>> If made some tests with lists of one million items:
>>
>> The first test was with a randomized sequence of numbers from 1 -
>> 1000000. Duplicate numbers were allowed. The comparing function was
>> the identity (lambda x: x).
> 
> I do not understand what you did, and therefore cannot judge results.  A
> compare function takes two args (the objects to be compared) and
> returns -1, 0, 1.  (I believe 0, 1 for <= or >t may be sufficient
> currently).

That's the "traditional" approach (and it doesn't make much sense -
see the "Searching and Sorting FAQ" in the Python Cookbook from Tim
Peters: "Why does Python use the three out-come cmp for sorting? Why
doesn't it use a simple less-than comparison instead?". It better to
abstract the function from the comparison.

> How did you use the one-param identity?

I assembled my "pass function" approach and TCD's DSU approach into a
single program[1]

Then I generated two lists[2]:
>>> a = range(1000000); a.reverse()
and
>>> b = randseq(1, 1000000, 1000000, 'repeat')
(meaning: create a list with one million integers 1 <= n <= 1000000;
allow duplicate numbers)

Then I timed [3]
timer(funcsort, a, lambda x: x, 'dsu'))  -> 43 sec
timer(funcsort, a, lambda x: x, 'pass')) -> 86 sec
timer(funcsort, b, lambda x: x, 'dsu'))  -> 24 sec
timer(funcsort, b, lambda x: x, 'pass')) ->  5 sec

[1]
def funcsort(seq, func, modus = 'dsu'):
    """ sort seq by func(item) """
    if   func is  None:
        seq = seq[:]
        seq.sort()
        return seq
    elif modus == 'dsu':
        seq = [(func(item), index, item) for index, item in enumerate(seq)]
        seq.sort()
        return [item[2] for item in seq]
    elif modus == 'pass':
        seq = seq[:]
        seq.sort(lambda x, y: cmp(func(x), func(y)))
        return seq
[2]
def randseq(range_start, range_end = 0, count = 0, modus = 'norepeat'):
    from random import randint

    if modus == 'repeat':
        return [randint(range_start, range_end) for index in range(count)]

    elif isinstance(range_start, int):
        return randseq(range(range_start, range_end + 1))[:count]

    else:  # 'range_start' is a seq, so shuffle
        range_start = range_start[:]
        randomseq =   []

        for i in range(len(range_start)):
            itemindex = randint(0, len(range_start) - 1)
            randomseq.append(range_start[itemindex])
            del range_start[itemindex]
        return randomseq
[3]
def timer(iteration, *func_and_args):
    """
    print the time elapsed (in seconds) evaluating func iteration
times
    (default is '1') """
    import gc, \
           time
    from colour import ltyellow, \
                       reset_color

    if isinstance(iteration, int):
        func, args = func_and_args[0], func_and_args[1:]
    else:
        # if first argument is not a number, set func to iteration and iteration
        # to '1'
        iteration, func, args = 1, iteration, func_and_args

    iteration =        range(iteration)
    gc.collect()  # force garbage collection
    start_time_cpu =   time.clock()
    start_time_total = time.time()

    for i in iteration:
        func(*args)
    print ltyellow, 'cpu: %(cpu).3f,'    % {'cpu': time.clock()  - start_time_cpu}, \
                    'total: %(total).3f' % {'total': time.time() - start_time_total}, reset_color



More information about the Python-list mailing list