python 3's adoption

Steven D'Aprano steven at REMOVE.THIS.cybersource.com.au
Wed Jan 27 20:44:48 EST 2010


On Wed, 27 Jan 2010 16:16:59 -0800, Jonathan Gardner wrote:

> On Jan 27, 3:54 pm, Paul Rubin <no.em... at nospam.invalid> wrote:
>> Steven D'Aprano <ste... at REMOVE.THIS.cybersource.com.au> writes:
>> > always much better written with key rather than cmp: key adds an O(N)
>> > overheard to the sorting, while cmp makes sorting O(N**2).
>>
>> Whaaaaaaaaaa ...... ????  No I don't think so.
> 
> You're referring to the O(N**2) bit, right? I am sure he knew it was O
> (N*log(N)), which is still worse than O(N) for key.
> 
> If he didn't, well, Python has some fundamental flaws in its basic sort
> algorithm.


I may have the details wrong, but the performance of sort with cmp is 
very poor.

sort with key calls the key function once for each element in the list, 
which is O(N), and then sorts O(N*log N), giving a total of O(N*log N).

sort with cmp has to compare the comparison function multiple times for 
each element. I *thought* it ended up calling it N times each, giving 
O(N**2), but I may be delusional. Whatever it is though, it's quite slow. 
Here's my test code (for Python 2.5):



from timeit import Timer
from string import letters
from random import shuffle

def mycmp(s, t):
    return cmp(s.upper(), t.upper())

def randstr():
    L = list(letters)
    shuffle(L)
    return ''.join(L)

setup = "from __main__ import mycmp, data"

N = 300
data = [randstr() for _ in xrange(N)]
t1 = Timer("d = data[:]; d.sort()", setup)  # raw sort
t2 = Timer("d = data[:]; d.sort(cmp=mycmp)", setup)
t3 = Timer("d = data[:]; d.sort(key=str.upper)", setup)

for t in (t1, t2, t3):
    print min(t.repeat(number=500))

N *= 10 # Do it again with bigger N.
data = [randstr() for _ in xrange(N)]
for t in (t1, t2, t3):
    print min(t.repeat(number=500))



And here are the results on my PC:

# N = 300
0.0631489753723
2.36756515503
0.226485967636

# N = 3000
1.03457903862
35.3092911243
2.77242517471



This doesn't support my claim of O(N**2), but it does demonstrate that 
sort with cmp should be avoided if at all possible.



-- 
Steven



More information about the Python-list mailing list