no more comparisons

Duncan Booth duncan.booth at invalid.invalid
Thu Mar 13 05:50:07 EDT 2008


Paul Rubin <http://phr.cx@NOSPAM.invalid> wrote:

> "Terry Reedy" <tjreedy at udel.edu> writes:
>> | I don't see what's so inefficient about it necessarily.
>> 
>> The key function is called once per list item, for n calls total. 
>> The comparision function is called once per comparision.  There are
>> at least n-1 such calls and typically something on the order of n *
>> lg2(n) calls. 
> 
> Right.  Depending on the application, calling the comparison function
> lg(n) times may be faster than calling the key function once.
> 
> Example: you want to sort a list of strings case-independently, each
> string containing a few pages worth of text.  There are 10000 strings,
> each maybe 10000 chars long.  Then (untested):
> 
>    x.sort(xs, key=lower)
> 
> makes a copy of each of the 10000 strings, makes 10000 tuples, sorts,
> then releases the temporary strings and tuples.  At least 100
> megabytes of memory allocated and 100 million chars copied, even
> though any give pair of strings will usually differ somewhere in the
> first few characters and there's no need to look past those.

Not tuples actually, there's a special sortwrapper type which is used. 
This is slightly better as it stops the comparison leaking through to 
the original values without having to store an index value.

As Dan has pointed out, for those rare cases where it actually is better 
to use cmp than key you can easily create a wrapper. The only change is 
that you now have to work at using cmp instead of it being the first 
thing you think of. So here you could use his wrapper and your 
comparison function together:

  x.sort(key=cmp_key(xcmp))

I'd agree though that the comparison wrapper should be reasonably well 
publicised: I'd remembered that there was an easy way to implement cmp 
in terms of key but not what it was.




More information about the Python-list mailing list