list sort question

Alex Martelli aleaxit at yahoo.com
Sat Oct 7 06:42:39 EDT 2000


"Erik Max Francis" <max at alcyone.com> wrote in message
news:39DE9174.9A4F4F93 at alcyone.com...
> Bill Tolbert wrote:
>
> > I was asked today to sort a list of names and email addresses by last
> > name. I ended up with a list of tuples:
> >
> > [('Bill Tolbert', 'bill_tolbert at bigfoot.com'), ...]
> >
> > I couldn't figure out the list.sort(somefunction) syntax and resorted
> > to all sorts of slicing before I gave up and solved it in a query
> > (outside of Python).
>
> The function passed to sort takes two arguments, and must return -1, 0,
> or 1 if the first is less than, equal to, or greater than the second
> (respectively).

Yes, this is true.  As you remark, it only works well for sorting SMALL
sequences.

> To get the last name of one of your tuples, you'd want something like
>
>     import string
>
>     tuple = ('Bill Tolbert', 'bill_tolbert at bigfoot.com')
>     string.split(tuple[0])[-1] # take the first element,
>                                # split it into words,
>                                # take the last word
>
> So to sort the list you could do something like:
>
>     list.sort(lambda x, y: cmp(string.split(x[0])[-1],
>                                string.split(y[0])[-1]))

The problem of course is that, this way, the splits are done O(N log N)
times -- and the same number of times is paid the interpretation
overhead of lambda &c.

> Now this would be pretty unwieldy on anything but tiny lists.  If you
> are expecting to have even relatively sizeable lists, you'd probably
> want to choose another data structure (another poster suggested a
> dictionary whose keys are the last name and whose values are the
> tuples).

That's one possibility.  The alternative design pattern for sorting is:

zip the sortfields to your payload -- sortfields in front, in order;
just sort with no special comparison function;
extract the payloads.

First and third step are O(N), not O(N log N), so for a large enough
list it's not crucial that those be fast -- the middle step, which is
already as fast as can be, will dominate anyway (IT is O(N log N)).

Guttman and Rosler have an excellent paper on that at:
http://www.hpl.hp.com/personal/Larry_Rosler/sort/sorting.html
it's unfortunately in Perl terms, but it also applies to Python.
Maybe somebody should translate it...


Alex








More information about the Python-list mailing list