Sorting Lists

phawkins at spamnotconnact.com phawkins at spamnotconnact.com
Wed Aug 1 12:44:10 EDT 2001


As expected, decorate-sort-undecorate runs out of memory much sooner
than the function-calling cmp_func version, and slows dramatically.
This test ran on a P 166 with 80 megs of RAM, running Redhat Linux
6.2.  

So... this version of decorate-sort-undecorate, using sort with no
auxiliary function, is a little less than 4 times faster than an
in-place sort using an auxiliary function.  Until the d-s-u algorithm
runs out of memory.

Which leaves us arithmetic afficionados asking: How much memory does a
shallow copy of a list take up, relative to the size of the original
list? 

Script started on Tue Jul 31 22:31:27 2001
python test_decorate.py 

data length: 10
decorate: 0.010000
cmp:      0.000000
cmp/dec:  0.000000

data length: 100
decorate: 0.010000
cmp:      0.010000
cmp/dec:  1.000000

data length: 200
decorate: 0.020000
cmp:      0.040000
cmp/dec:  2.000000

data length: 300
decorate: 0.020000
cmp:      0.060000
cmp/dec:  3.000000

data length: 400
decorate: 0.020000
cmp:      0.090000
cmp/dec:  4.500000

data length: 500
decorate: 0.030000
cmp:      0.110000
cmp/dec:  3.666667

data length: 1000
decorate: 0.070000
cmp:      0.260000
cmp/dec:  3.714286

data length: 5000
decorate: 0.420000
cmp:      1.610000
cmp/dec:  3.833333

data length: 10000
decorate: 0.890000
cmp:      3.540000
cmp/dec:  3.977528

data length: 50000
decorate: 5.440000
cmp:      21.310000
cmp/dec:  3.917279

data length: 75000
decorate: 8.940000
cmp:      34.040000
cmp/dec:  3.807606

data length: 100000
decorate: 12.070000
cmp:      45.710000
cmp/dec:  3.787075

data length: 150000
decorate: 19.420000
cmp:      70.690000
cmp/dec:  3.640062

data length: 200000
decorate: 30.740000
cmp:      97.650000
cmp/dec:  3.176643

data length: 250000
decorate: 111.100000
cmp:      116.010000
cmp/dec:  1.044194

data length: 300000
decorate: 400.050000
cmp:      141.570000
cmp/dec:  0.353881


Script done on Wed Aug  1 10:22:48 2001

>>>>> "p" == phawkins  <phawkins at connact.com> writes:

>>>>> "TB" == Tom Bryan <tbryan at python.net> writes:
TB> phawkins at connact.com wrote:

TB> Of course, I never claimed that it was faster.  One simply avoids 
TB> copying the list and sorts in place rather than returning a sorted 
TB> copy of the list. :-)

TB> By the way, would you care to post the results of your performances 
TB> tests.  Is it faster for lists of all sizes?  How does the speed difference 
TB> change as the size of the list changs?  

p> Looks pretty stable for the sizes of lists that I tested -- running it
p> several different times for all sizes of lists, it looks as if speed
p> differences are based on random data differences -- but I haven't yet
p> seen what it does when it thrashes.  I'll come back and post when I have.

TB> you-can't-get-away-that-easily-ly yours
TB> ---Tom

p> Well, I would have, but I'd done it in IDLE some time ago when I first
p> saw discussion of decorate-sort-undecorate.  But, in the interests of
p> truth and justice, here's a new version.

p> I designed my random-data-generator so as to create data fairly
p> similar to the original example in this thread, with the result that
p> the get-sick&tired-waiting-for-it limiting factor is the data
p> generation.  So if you want to play around with it, so as to see what
p> happens when the system starts thrashing, you might want to simplify
p> the data generation.

p> This is Windows 98, PII 450, 256 meg ram (and with several other programs
p> running). 

p> ------test_decorate.py-----------
p> from whrandom import randint
p> from string import letters
p> from time import clock

p> def make_data(length):
p>     return [[make_name(), make_name(), randint(0, 100)] for i in xrange(length)]

p> def make_name():
p>     word = ""
p>     length = randint(3,10)
p>     while len(word) < length:
p>         word = word + letters[randint(0,25)]
p>     return word.capitalize()

p> def dec_sort_by_field(seq,index):
p>      tmp = [ (item[index],item) for item in seq ]
p>      tmp.sort()
p>      return [ item for (_,item) in tmp ]

p> def cmp_sort_by_field( index ):
p>     def cmpfunc( x, y, i=index ):
p>         return cmp( x[i], y[i] )
p>     return cmpfunc

p> def time_cmp(data):
p>     start = clock()
p>     data.sort(cmp_sort_by_field(1))
p>     end = clock()
p>     return end - start  

p> def time_dec(data):
p>     start = clock()
p>     dec_sort_by_field(data, 1)
p>     end = clock()
p>     return end - start
    
p> for i in [10,100,200,300,400,500,1000,5000,10000,50000,75000,100000,150000,\
p>           200000, 250000, 300000]:
p>     data = make_data(i)
p>     dec = time_dec(data)
p>     cp = time_cmp(data)
p>     print "data length: %d" % i
p>     print "decorate: %f" % dec
p>     print "cmp:      %f" % cp
p>     if dec:
p>         print "cmp/dec:  %f\n" % (cp/dec)
p> -------------------end------------------

p> ran it a few times in emacs.... never did max memory.


p> Python 2.0 (#8, Oct 16 2000, 17:27:58) [MSC 32 bit (Intel)] on win32
p> Type "copyright", "credits" or "license" for more information.
>>>> ## working on region in file c:/WINDOWS/TEMP/python--722007F4D...
p> data length: 10
p> decorate: 0.000138
p> cmp:      0.000373
p> cmp/dec:  2.696970

p> data length: 100
p> decorate: 0.001664
p> cmp:      0.004707
p> cmp/dec:  2.829219

p> data length: 200
p> decorate: 0.002791
p> cmp:      0.062291
p> cmp/dec:  22.319520

p> data length: 300
p> decorate: 0.004592
p> cmp:      0.019297
p> cmp/dec:  4.202409

p> data length: 400
p> decorate: 0.006918
p> cmp:      0.028260
p> cmp/dec:  4.084676

p> data length: 500
p> decorate: 0.007772
p> cmp:      0.036634
p> cmp/dec:  4.713793

p> data length: 1000
p> decorate: 0.018267
p> cmp:      0.083722
p> cmp/dec:  4.583226

p> data length: 5000
p> decorate: 0.132274
p> cmp:      0.539986
p> cmp/dec:  4.082318

p> data length: 10000
p> decorate: 0.350746
p> cmp:      1.232022
p> cmp/dec:  3.512577

p> data length: 50000
p> decorate: 1.971823
p> cmp:      7.555448
p> cmp/dec:  3.831707

p> data length: 75000
p> decorate: 3.181551
p> cmp:      11.692751
p> cmp/dec:  3.675173

p> data length: 100000
p> decorate: 4.401343
p> cmp:      16.481489
p> cmp/dec:  3.744650

p> data length: 150000
p> decorate: 7.256650
p> cmp:      25.021690
p> cmp/dec:  3.448105

>>>> ## working on region in file c:/WINDOWS/TEMP/python--722007SCK...
p> data length: 10
p> decorate: 0.000184
p> cmp:      0.000225
p> cmp/dec:  1.218182

p> data length: 100
p> decorate: 0.001300
p> cmp:      0.005864
p> cmp/dec:  4.511283

p> data length: 200
p> decorate: 0.002601
p> cmp:      0.011922
p> cmp/dec:  4.584273

p> data length: 300
p> decorate: 0.004932
p> cmp:      0.019423
p> cmp/dec:  3.937978

p> data length: 400
p> decorate: 0.008885
p> cmp:      0.027430
p> cmp/dec:  3.087059

p> data length: 500
p> decorate: 0.008477
p> cmp:      0.036760
p> cmp/dec:  4.336662

p> data length: 1000
p> decorate: 0.020899
p> cmp:      0.082618
p> cmp/dec:  3.953240

p> data length: 5000
p> decorate: 0.151913
p> cmp:      0.563729
p> cmp/dec:  3.710857

p> data length: 10000
p> decorate: 0.304589
p> cmp:      1.485802
p> cmp/dec:  4.878048

p> data length: 50000
p> decorate: 1.758838
p> cmp:      7.936150
p> cmp/dec:  4.512156

p> data length: 75000
p> decorate: 3.160845
p> cmp:      12.241860
p> cmp/dec:  3.872971

p> data length: 100000
p> decorate: 4.147288
p> cmp:      15.488422
p> cmp/dec:  3.734590

p> data length: 150000
p> decorate: 6.621114
p> cmp:      24.302384
p> cmp/dec:  3.670437

>>>> ## working on region in file c:/WINDOWS/TEMP/python--722007fMQ...
p> data length: 10
p> decorate: 0.000186
p> cmp:      0.000357
p> cmp/dec:  1.918919

p> data length: 100
p> decorate: 0.001405
p> cmp:      0.004687
p> cmp/dec:  3.337112

p> data length: 200
p> decorate: 0.002718
p> cmp:      0.011958
p> cmp/dec:  4.399630

p> data length: 300
p> decorate: 0.005749
p> cmp:      0.019170
p> cmp/dec:  3.334743

p> data length: 400
p> decorate: 0.007392
p> cmp:      0.028275
p> cmp/dec:  3.825057

p> data length: 500
p> decorate: 0.008398
p> cmp:      0.038744
p> cmp/dec:  4.613573

p> data length: 1000
p> decorate: 0.020952
p> cmp:      0.083020
p> cmp/dec:  3.962478

p> data length: 5000
p> decorate: 0.152054
p> cmp:      0.588490
p> cmp/dec:  3.870263

p> data length: 10000
p> decorate: 0.559918
p> cmp:      1.350087
p> cmp/dec:  2.411223

p> data length: 50000
p> decorate: 2.027378
p> cmp:      7.347429
p> cmp/dec:  3.624104

p> data length: 75000
p> decorate: 3.062071
p> cmp:      11.395732
p> cmp/dec:  3.721576

p> data length: 100000
p> decorate: 4.131656
p> cmp:      15.764242
p> cmp/dec:  3.815478

p> data length: 150000
p> decorate: 7.114213
p> cmp:      24.034858
p> cmp/dec:  3.378428

p> data length: 200000
p> decorate: 9.644709
p> cmp:      33.191710
p> cmp/dec:  3.441442

p> data length: 250000
p> decorate: 17.852266
p> cmp:      50.232261
p> cmp/dec:  2.813775

p> data length: 300000
p> decorate: 20.336841
p> cmp:      75.334111
p> cmp/dec:  3.704317

>>>> 




More information about the Python-list mailing list