no more comparisons

Paul Rubin http
Thu Mar 13 01:57:33 EDT 2008


"Terry Reedy" <tjreedy at udel.edu> writes:
> | I don't see what's so inefficient about it necessarily.
> 
> The key function is called once per list item, for n calls total.  The 
> comparision function is called once per comparision.  There are at least 
> n-1 such calls and typically something on the order of n * lg2(n) calls. 

Right.  Depending on the application, calling the comparison function
lg(n) times may be faster than calling the key function once.

Example: you want to sort a list of strings case-independently, each
string containing a few pages worth of text.  There are 10000 strings,
each maybe 10000 chars long.  Then (untested):

   x.sort(xs, key=lower)

makes a copy of each of the 10000 strings, makes 10000 tuples, sorts,
then releases the temporary strings and tuples.  At least 100
megabytes of memory allocated and 100 million chars copied, even
though any give pair of strings will usually differ somewhere in the
first few characters and there's no need to look past those.

   from string import lower
   from operator import eq
   from itertools import *

   def xcmp(a,b):
      for x,y in izip(a,b)):
         c = cmp(x.lower() y.lower())
         if c: return c
      return 0

   x.sort(xcmp)

runs the comparison function about 14000 times, looking at only a few
bytes on most of the calls, and using only a small constant amount of
auxiliary storage, and is likely to be much faster than all that
copying.

Hmm, there is a slight problem with xcmp-- it will fail if a is a
prefix of b.  That's fixable but anyway it still makes its point as
written.



More information about the Python-list mailing list