[Tutor] Sorting Data

Gregor Lingl glingl@aon.at
Tue, 20 Aug 2002 04:00:19 +0200


Jeff Shannon schrieb:

>Gregor Lingl wrote:
>
>  
>
>>Another way to accomplish this is to provide your own comparison-function
>>as an argument to sort
>>    
>>
>
>This is another possibility, but extensive discussion on comp.lang.python some
>time ago established that, in most cases, it's far more efficient to use the
>decorate-sort-undecorate pattern.  Copying the list is fairly quick (you're only
>copying references), and if you use the default comparison function, then all of
>the comparisons are done in built-in compiled (and optimized) C code.  If you
>provide your own comparison function, OTOH, then not only are the comparisons
>being done in (slower) Python code, but you have an extra (slow) function call
>into Python for each and every compare that you do -- and the number of compares
>increases geometrically with the size of the list (I don't recall the exact order
>of magnitude, but it's definately greater than linear).  So the longer the list
>you're sorting, the more time it takes per item...  For particularly short lists,
>a custom cmp() function may be faster, because it avoids a little bit of memory
>allocation overhead, but for lists of even moderate length,
>decorate-sort-undecorate takes the advantage and it only gets more significant as
>the list gets longer.
>
>Jeff Shannon
>Technician/Programmer
>Credit International
>
>
>  
>
Thanks for these considerations.

To determine what means "far more efficient" I did a quick and dirty
comparison for lists of pairs of random integers up to the length
of 100000. I obtained the following results - of course these
are only rough approximations as one never knows what happens
when on a multitasking - machine ...

length   quotient of time(sorted with comp-fun/sorted with decorate...)
of list       testrun 1         testrun 2           testrun 3

   5000         2.755              1.722               2.579
  10000         1.968              2.015               1.983
  20000         2.351              1.334               2.089
  50000         1.873              1.669               2.233
 100000         1.748              1.857               1.881

So the decorate-sort-undecorate method is clearly faster.
OTOH the quotient doesn't seem to depend significantly on
the length of the list.

Gregor


>_______________________________________________
>Tutor maillist  -  Tutor@python.org
>http://mail.python.org/mailman/listinfo/tutor
>
>
>  
>