Sorting Lists
phawkins at spamnotconnact.com
phawkins at spamnotconnact.com
Wed Aug 1 12:44:10 EDT 2001
As expected, decorate-sort-undecorate runs out of memory much sooner
than the function-calling cmp_func version, and slows dramatically.
This test ran on a P 166 with 80 megs of RAM, running Redhat Linux
6.2.
So... this version of decorate-sort-undecorate, using sort with no
auxiliary function, is a little less than 4 times faster than an
in-place sort using an auxiliary function. Until the d-s-u algorithm
runs out of memory.
Which leaves us arithmetic afficionados asking: How much memory does a
shallow copy of a list take up, relative to the size of the original
list?
Script started on Tue Jul 31 22:31:27 2001
python test_decorate.py
data length: 10
decorate: 0.010000
cmp: 0.000000
cmp/dec: 0.000000
data length: 100
decorate: 0.010000
cmp: 0.010000
cmp/dec: 1.000000
data length: 200
decorate: 0.020000
cmp: 0.040000
cmp/dec: 2.000000
data length: 300
decorate: 0.020000
cmp: 0.060000
cmp/dec: 3.000000
data length: 400
decorate: 0.020000
cmp: 0.090000
cmp/dec: 4.500000
data length: 500
decorate: 0.030000
cmp: 0.110000
cmp/dec: 3.666667
data length: 1000
decorate: 0.070000
cmp: 0.260000
cmp/dec: 3.714286
data length: 5000
decorate: 0.420000
cmp: 1.610000
cmp/dec: 3.833333
data length: 10000
decorate: 0.890000
cmp: 3.540000
cmp/dec: 3.977528
data length: 50000
decorate: 5.440000
cmp: 21.310000
cmp/dec: 3.917279
data length: 75000
decorate: 8.940000
cmp: 34.040000
cmp/dec: 3.807606
data length: 100000
decorate: 12.070000
cmp: 45.710000
cmp/dec: 3.787075
data length: 150000
decorate: 19.420000
cmp: 70.690000
cmp/dec: 3.640062
data length: 200000
decorate: 30.740000
cmp: 97.650000
cmp/dec: 3.176643
data length: 250000
decorate: 111.100000
cmp: 116.010000
cmp/dec: 1.044194
data length: 300000
decorate: 400.050000
cmp: 141.570000
cmp/dec: 0.353881
Script done on Wed Aug 1 10:22:48 2001
>>>>> "p" == phawkins <phawkins at connact.com> writes:
>>>>> "TB" == Tom Bryan <tbryan at python.net> writes:
TB> phawkins at connact.com wrote:
TB> Of course, I never claimed that it was faster. One simply avoids
TB> copying the list and sorts in place rather than returning a sorted
TB> copy of the list. :-)
TB> By the way, would you care to post the results of your performances
TB> tests. Is it faster for lists of all sizes? How does the speed difference
TB> change as the size of the list changs?
p> Looks pretty stable for the sizes of lists that I tested -- running it
p> several different times for all sizes of lists, it looks as if speed
p> differences are based on random data differences -- but I haven't yet
p> seen what it does when it thrashes. I'll come back and post when I have.
TB> you-can't-get-away-that-easily-ly yours
TB> ---Tom
p> Well, I would have, but I'd done it in IDLE some time ago when I first
p> saw discussion of decorate-sort-undecorate. But, in the interests of
p> truth and justice, here's a new version.
p> I designed my random-data-generator so as to create data fairly
p> similar to the original example in this thread, with the result that
p> the get-sick&tired-waiting-for-it limiting factor is the data
p> generation. So if you want to play around with it, so as to see what
p> happens when the system starts thrashing, you might want to simplify
p> the data generation.
p> This is Windows 98, PII 450, 256 meg ram (and with several other programs
p> running).
p> ------test_decorate.py-----------
p> from whrandom import randint
p> from string import letters
p> from time import clock
p> def make_data(length):
p> return [[make_name(), make_name(), randint(0, 100)] for i in xrange(length)]
p> def make_name():
p> word = ""
p> length = randint(3,10)
p> while len(word) < length:
p> word = word + letters[randint(0,25)]
p> return word.capitalize()
p> def dec_sort_by_field(seq,index):
p> tmp = [ (item[index],item) for item in seq ]
p> tmp.sort()
p> return [ item for (_,item) in tmp ]
p> def cmp_sort_by_field( index ):
p> def cmpfunc( x, y, i=index ):
p> return cmp( x[i], y[i] )
p> return cmpfunc
p> def time_cmp(data):
p> start = clock()
p> data.sort(cmp_sort_by_field(1))
p> end = clock()
p> return end - start
p> def time_dec(data):
p> start = clock()
p> dec_sort_by_field(data, 1)
p> end = clock()
p> return end - start
p> for i in [10,100,200,300,400,500,1000,5000,10000,50000,75000,100000,150000,\
p> 200000, 250000, 300000]:
p> data = make_data(i)
p> dec = time_dec(data)
p> cp = time_cmp(data)
p> print "data length: %d" % i
p> print "decorate: %f" % dec
p> print "cmp: %f" % cp
p> if dec:
p> print "cmp/dec: %f\n" % (cp/dec)
p> -------------------end------------------
p> ran it a few times in emacs.... never did max memory.
p> Python 2.0 (#8, Oct 16 2000, 17:27:58) [MSC 32 bit (Intel)] on win32
p> Type "copyright", "credits" or "license" for more information.
>>>> ## working on region in file c:/WINDOWS/TEMP/python--722007F4D...
p> data length: 10
p> decorate: 0.000138
p> cmp: 0.000373
p> cmp/dec: 2.696970
p> data length: 100
p> decorate: 0.001664
p> cmp: 0.004707
p> cmp/dec: 2.829219
p> data length: 200
p> decorate: 0.002791
p> cmp: 0.062291
p> cmp/dec: 22.319520
p> data length: 300
p> decorate: 0.004592
p> cmp: 0.019297
p> cmp/dec: 4.202409
p> data length: 400
p> decorate: 0.006918
p> cmp: 0.028260
p> cmp/dec: 4.084676
p> data length: 500
p> decorate: 0.007772
p> cmp: 0.036634
p> cmp/dec: 4.713793
p> data length: 1000
p> decorate: 0.018267
p> cmp: 0.083722
p> cmp/dec: 4.583226
p> data length: 5000
p> decorate: 0.132274
p> cmp: 0.539986
p> cmp/dec: 4.082318
p> data length: 10000
p> decorate: 0.350746
p> cmp: 1.232022
p> cmp/dec: 3.512577
p> data length: 50000
p> decorate: 1.971823
p> cmp: 7.555448
p> cmp/dec: 3.831707
p> data length: 75000
p> decorate: 3.181551
p> cmp: 11.692751
p> cmp/dec: 3.675173
p> data length: 100000
p> decorate: 4.401343
p> cmp: 16.481489
p> cmp/dec: 3.744650
p> data length: 150000
p> decorate: 7.256650
p> cmp: 25.021690
p> cmp/dec: 3.448105
>>>> ## working on region in file c:/WINDOWS/TEMP/python--722007SCK...
p> data length: 10
p> decorate: 0.000184
p> cmp: 0.000225
p> cmp/dec: 1.218182
p> data length: 100
p> decorate: 0.001300
p> cmp: 0.005864
p> cmp/dec: 4.511283
p> data length: 200
p> decorate: 0.002601
p> cmp: 0.011922
p> cmp/dec: 4.584273
p> data length: 300
p> decorate: 0.004932
p> cmp: 0.019423
p> cmp/dec: 3.937978
p> data length: 400
p> decorate: 0.008885
p> cmp: 0.027430
p> cmp/dec: 3.087059
p> data length: 500
p> decorate: 0.008477
p> cmp: 0.036760
p> cmp/dec: 4.336662
p> data length: 1000
p> decorate: 0.020899
p> cmp: 0.082618
p> cmp/dec: 3.953240
p> data length: 5000
p> decorate: 0.151913
p> cmp: 0.563729
p> cmp/dec: 3.710857
p> data length: 10000
p> decorate: 0.304589
p> cmp: 1.485802
p> cmp/dec: 4.878048
p> data length: 50000
p> decorate: 1.758838
p> cmp: 7.936150
p> cmp/dec: 4.512156
p> data length: 75000
p> decorate: 3.160845
p> cmp: 12.241860
p> cmp/dec: 3.872971
p> data length: 100000
p> decorate: 4.147288
p> cmp: 15.488422
p> cmp/dec: 3.734590
p> data length: 150000
p> decorate: 6.621114
p> cmp: 24.302384
p> cmp/dec: 3.670437
>>>> ## working on region in file c:/WINDOWS/TEMP/python--722007fMQ...
p> data length: 10
p> decorate: 0.000186
p> cmp: 0.000357
p> cmp/dec: 1.918919
p> data length: 100
p> decorate: 0.001405
p> cmp: 0.004687
p> cmp/dec: 3.337112
p> data length: 200
p> decorate: 0.002718
p> cmp: 0.011958
p> cmp/dec: 4.399630
p> data length: 300
p> decorate: 0.005749
p> cmp: 0.019170
p> cmp/dec: 3.334743
p> data length: 400
p> decorate: 0.007392
p> cmp: 0.028275
p> cmp/dec: 3.825057
p> data length: 500
p> decorate: 0.008398
p> cmp: 0.038744
p> cmp/dec: 4.613573
p> data length: 1000
p> decorate: 0.020952
p> cmp: 0.083020
p> cmp/dec: 3.962478
p> data length: 5000
p> decorate: 0.152054
p> cmp: 0.588490
p> cmp/dec: 3.870263
p> data length: 10000
p> decorate: 0.559918
p> cmp: 1.350087
p> cmp/dec: 2.411223
p> data length: 50000
p> decorate: 2.027378
p> cmp: 7.347429
p> cmp/dec: 3.624104
p> data length: 75000
p> decorate: 3.062071
p> cmp: 11.395732
p> cmp/dec: 3.721576
p> data length: 100000
p> decorate: 4.131656
p> cmp: 15.764242
p> cmp/dec: 3.815478
p> data length: 150000
p> decorate: 7.114213
p> cmp: 24.034858
p> cmp/dec: 3.378428
p> data length: 200000
p> decorate: 9.644709
p> cmp: 33.191710
p> cmp/dec: 3.441442
p> data length: 250000
p> decorate: 17.852266
p> cmp: 50.232261
p> cmp/dec: 2.813775
p> data length: 300000
p> decorate: 20.336841
p> cmp: 75.334111
p> cmp/dec: 3.704317
>>>>
More information about the Python-list
mailing list