outline-style sorting algorithm
Thorsten Kampe
thorsten at thorstenkampe.de
Thu Apr 29 16:34:02 EDT 2004
* Terry Reedy (2004-04-29 20:03 +0100)
> "Thorsten Kampe" <thorsten at thorstenkampe.de> wrote in message
> news:c2bdjg9l06p2.dlg at thorstenkampe.de...
>> If made some tests with lists of one million items:
>>
>> The first test was with a randomized sequence of numbers from 1 -
>> 1000000. Duplicate numbers were allowed. The comparing function was
>> the identity (lambda x: x).
>
> I do not understand what you did, and therefore cannot judge results. A
> compare function takes two args (the objects to be compared) and
> returns -1, 0, 1. (I believe 0, 1 for <= or >t may be sufficient
> currently).
That's the "traditional" approach (and it doesn't make much sense -
see the "Searching and Sorting FAQ" in the Python Cookbook from Tim
Peters: "Why does Python use the three out-come cmp for sorting? Why
doesn't it use a simple less-than comparison instead?". It better to
abstract the function from the comparison.
> How did you use the one-param identity?
I assembled my "pass function" approach and TCD's DSU approach into a
single program[1]
Then I generated two lists[2]:
>>> a = range(1000000); a.reverse()
and
>>> b = randseq(1, 1000000, 1000000, 'repeat')
(meaning: create a list with one million integers 1 <= n <= 1000000;
allow duplicate numbers)
Then I timed [3]
timer(funcsort, a, lambda x: x, 'dsu')) -> 43 sec
timer(funcsort, a, lambda x: x, 'pass')) -> 86 sec
timer(funcsort, b, lambda x: x, 'dsu')) -> 24 sec
timer(funcsort, b, lambda x: x, 'pass')) -> 5 sec
[1]
def funcsort(seq, func, modus = 'dsu'):
""" sort seq by func(item) """
if func is None:
seq = seq[:]
seq.sort()
return seq
elif modus == 'dsu':
seq = [(func(item), index, item) for index, item in enumerate(seq)]
seq.sort()
return [item[2] for item in seq]
elif modus == 'pass':
seq = seq[:]
seq.sort(lambda x, y: cmp(func(x), func(y)))
return seq
[2]
def randseq(range_start, range_end = 0, count = 0, modus = 'norepeat'):
from random import randint
if modus == 'repeat':
return [randint(range_start, range_end) for index in range(count)]
elif isinstance(range_start, int):
return randseq(range(range_start, range_end + 1))[:count]
else: # 'range_start' is a seq, so shuffle
range_start = range_start[:]
randomseq = []
for i in range(len(range_start)):
itemindex = randint(0, len(range_start) - 1)
randomseq.append(range_start[itemindex])
del range_start[itemindex]
return randomseq
[3]
def timer(iteration, *func_and_args):
"""
print the time elapsed (in seconds) evaluating func iteration
times
(default is '1') """
import gc, \
time
from colour import ltyellow, \
reset_color
if isinstance(iteration, int):
func, args = func_and_args[0], func_and_args[1:]
else:
# if first argument is not a number, set func to iteration and iteration
# to '1'
iteration, func, args = 1, iteration, func_and_args
iteration = range(iteration)
gc.collect() # force garbage collection
start_time_cpu = time.clock()
start_time_total = time.time()
for i in iteration:
func(*args)
print ltyellow, 'cpu: %(cpu).3f,' % {'cpu': time.clock() - start_time_cpu}, \
'total: %(total).3f' % {'total': time.time() - start_time_total}, reset_color
More information about the Python-list
mailing list