[Tutor] Sort and compare [why compare when sorting?]

Danny Yoo dyoo at hkn.eecs.berkeley.edu
Wed Oct 22 20:54:10 EDT 2003



On Wed, 22 Oct 2003, Conrad Koziol wrote:

> def compare(a,b):
>          if a[1]>b[1]: return 0
>          else: return -1
>
> x = [('hello', 2), ('goodbye', 3), ('python', 1)]
>
> x.sort(compare(x))


Hi Conrad,

This needs to be corrected to:

    x.sort(compare)


> can you explain what happens to me step by step. That'a were im lost.
>
> Does x sort [-1, 0, 1, -1]????


I think you're getting confused about how comparsion sorting works.  Do
you mind if we make the problem slightly simpler?  It may help make the
concept easier to understand, but this post is going to be a little long.
*grin*



Here's the simplification: let's say that we're trying to write a program
that sorts a list of three elements [a, b, c].  We're going to force this
length restriction just for the sake of understanding what's happening.
So say we have something like:

###
>>> numbers = [3, 1, 4]
###

Pretend that we're not allowed to use sort() directly, but we'd like to
rearrange the numbers here so that they're in sorted order.  We'd like a
function that takes something like [3, 1, 4] and transforms it to
[1, 3, 4].


Here's one way to do it:

###
>>> def sortThree(l):
...     l[0], l[1], l[2] = l[1], l[0], l[2]
...
>>> sortThree(numbers)
>>> numbers
[1, 3, 4]
###

... but of course, this is cheating!  *grin* Particular because if we send
numbers through it one more time, it just scrambles it all over again.

###
>>> numbers
[1, 3, 4]
>>> sortThree(numbers)
>>> numbers
[3, 1, 4]
###

So it's not a real sorter.  We want a more generalized sortThree() that
really works a list of three things.


What we can do is play something of a shell game: we can swap elements if
they're in the wrong order.  We can check the first two, check the second
two, and swing elements around till the list is sorted.  Here's one way to
do it:

###
>>> def sortThree(l):
...     if l[0] > l[1]:
...         l[0], l[1] = l[1], l[0]
...     if l[1] > l[2]:
...         l[1], l[2] = l[2], l[1]
...     if l[0] > l[1]:
...         l[0], l[1] = l[1], l[0]
...
>>> numbers
[3, 1, 4]
>>> sortThree(numbers)
>>> numbers
[1, 3, 4]
>>> sortThree(numbers)
>>> numbers
[1, 3, 4]
###

Yeah, the numbers stay sorted, even if we call sortThree() twice on the
list.  That's a good thing.  *grin*


And it works on lists of three words too:

###
>>> words = ["this", "is", "a"]
>>> sortThree(words)
>>> words
['a', 'is', 'this']
###

Do you understand how sortThree() is working at the moment?  The key is to
see that it's sorting by doing comparisons '<' and swaps.  Let's look at
it again:

###
def sortThree(l):
     if l[0] > l[1]:
         l[0], l[1] = l[1], l[0]
     if l[1] > l[2]:
         l[1], l[2] = l[2], l[1]
     if l[0] > l[1]:
         l[0], l[1] = l[1], l[0]
###


If we wanted to use this to solve your original problem of sorting:

    x = [('hello', 2), ('goodbye', 3), ('python', 1)]

then it sort of works:

###
>>> x = [('hello', 2), ('goodbye', 3), ('python', 1)]
>>> sortThree(x)
>>> x
[('goodbye', 3), ('hello', 2), ('python', 1)]
###

But the problem is that it's sorting by comparing tuples, and
    'goodbye' < 'hello' < python.

You want it so that it compares against the numbers within each tuple.
We want to change sortThree() so that, instead of doing:

     if l[0] > l[1] ...
     if l[1] > l[2] ...
     if l[0] > l[1] ...

it does something more like this:

     if l[0][1] > l[1][1] ...
     if l[1][1] > l[2][1] ...
     if l[0][1] > l[1][1] ...

And that's where comparison functions come in: we can rewrite sortThree()
and "teach" it how to do '>' the way we want it to.

###
def sortThree(l, mycmp):
     if mycmp(l[0], l[1]) > 0:
         l[0], l[1] = l[1], l[0]
     if mycmp(l[1], l[2]) > 0:
         l[1], l[2] = l[2], l[1]
     if mycmp(l[0], l[1]) > 0:
         l[0], l[1] = l[1], l[0]
###


Python comes with its own built-in comparison function called cmp(), so we
can use it:

###
>>> numbers = [3, 1, 4]
>>> sortThree(numbers, cmp)
>>> numbers
[1, 3, 4]
>>> words = ["this", "is", "a"]
>>> sortThree(words, cmp)
>>> words
['a', 'is', 'this']
>>> x = [('hello', 2), ('goodbye', 3), ('python', 1)]
>>> sortThree(x, cmp)
>>> x
[('goodbye', 3), ('hello', 2), ('python', 1)]
###


And we can now feed sortThree a new comparison function that looks at the
second component of each tuple:

###
>>> def compare(a, b):
...     if a[1] > b[1]: return 1
...     if a[1] == b[1]: return 0
...     if a[1] < b[1]: return -1
...
>>> sortThree(x, compare)
>>> x
[('python', 1), ('hello', 2), ('goodbye', 3)]
###


If you understand all of this, then Python's built-in sort() should not be
so mysterious anymore.  *grin*

But I went through this really fast, so please ask questions on the parts
of things that don't make sense yet.  One of us on Tutor will be happy to
talk about it more.


Good luck!




More information about the Tutor mailing list