outline-style sorting algorithm

Thorsten Kampe thorsten at thorstenkampe.de
Thu Apr 29 06:59:03 EDT 2004


* Delaney, Timothy C (Timothy) (2004-04-29 02:52 +0100)
> Thorsten Kampe wrote:
>>>>>> foo
>>> ['1.0', '1.0.1', '1.1.1', '1.2', '1.9', '1.10', '1.11', '1.20',
>>> '1.20.1', '1.30']
>>>>>> foo.sort()
>>>>>> foo
>>> ['1.0', '1.0.1', '1.1.1', '1.10', '1.11', '1.2', '1.20', '1.20.1',
>>> '1.30', '1.9'] 
>>> 
>>> Obviously 1.20.1 should be after 1.9 if we look at this as
>>> dot-delimited integers, not as decimal numbers.
>> 
>> You need some general approach to avoid the DSU thing:
>> 
>> def funcsort(seq, func):
>>     """ sort seq by func(item) """
>>     seq = seq[:]
>>     seq.sort(lambda x, y: cmp(func(x), func(y)))
>>     return seq
>> 
>> funcsort(foo, lambda x: map(int, x.split('.')))
> 
> I've seen you give this advice several times, today, and IMO it's
> completely wrong.
> 
> DSU is *exactly* what you want to do. If you want to wrap it inside a
> general function, that's fine, but DSU is in almost all cases preferred
> to passing a comparison function - it's much faster.

My advice was to use a more general approach like 'funcsort' to avoid
reiventing the wheel for every problem. DSU is okay for funcsort.
 
>     def sorted (seq, cmp=None, key=None):

What is 'cmp=None' for?

>         """ sort seq by func(item) """
>         if key is None:
>             seq = seq[:]
>         else:
>             seq = [(key(e), i, e) for i, e in enumerate(seq)]
> 
>         seq.sort(cmp)

Are you passing a second comparison function? And if, isn't that the
same as I did and to which you said "preferred to passing a comparison
function"? Are you shadowing the builtin cmp function?
 
>         if key is None:
>             return seq
>         else:
>             return [i[-1] for i in seq]

What is 'key=None' for? Is that for speeding up if someone wants to
pass the indentity function (f(x)=x; lambda x: x)?

> Note that Python 2.4 will have DSU as a built-in idiom to `list.sort`
> i.e. `list.sort(key=func)` but will be somewhat more efficient than the
> above. Likewise there will be a new builtin `sorted` which has exactly
> the same semantics as the above - it is stable, and it does not ever
> compare the actual value if a key function is supplied - only the key
> value (the above also compares the position, but that's an
> implementation detail and has no visible effect).

If it has no visible effects than it would be useless. In my opinion
it has the effect that two items that have the same func(x) stay in
the same position as before. Is that desirable? Was that your goal?

>>> sorted([4, 2], None, lambda x: x % 2)
[4, 2], but [2, 4] if the index was omitted


Thorsten



More information about the Python-list mailing list