[Tutor] Best approach to sort data based on several criteria

Magnus Lyckå magnus at thinkware.se
Wed Aug 13 22:24:40 EDT 2003


At 11:44 2003-07-21 -0300, Jorge Godoy wrote:
>I have a huge amount of data that's stored in a file where each record
>is in a line made of 22 columns, and each column is separated from the
>other by a ';'.
>
>I need to show this data (some columns each time) sorted out
>accordingly to some columns (e.g. columns zero, one, five and six,
>each on a different view and in one of these views, I need to sort
>based on multiple keys).

The fastest approach in Python is typically to transform your
data into a list which can be sorted directly with the list.sort()
method.

You can supply a function to the sort method call, but that makes
sorting much slower. For big lists with high performance requirements,
it's typically faster to create a new list that can be sorted as is.

This is sometimes called a Schwartzian transform, after Perl guru
Randal Schwartz. What you do in Python is to change a list of rows
to a list of tuples: (sort_criteria, row), then sort it, and finally
transform it back to the list of rows without the preceding sort
criteria. This will also work with dates if they are given as
objects of some date type/class such as mx.DateTime or the new
datetime class in 2.3, or in a sane string format, such as ISO 8601.

It could look something like this:

 >>> l = [('a', 2),
          ('c', 1),
          ('b', 3)]
 >>> col = 0
 >>> for i, row in enumerate(l):
         l[i] = (row[col], row)

 >>> l.sort()
 >>> for i, row in enumerate(l):
         l[i] = row[1]

 >>> l
[('a', 2), ('b', 3), ('c', 1)]
 >>> col = 1
 >>> for i, row in enumerate(l):
         l[i] = (row[col], row)

 >>> l.sort()
 >>> for i, row in enumerate(l):
         l[i] = row[1]

 >>> l
[('c', 1), ('a', 2), ('b', 3)]


--
Magnus Lycka (It's really Lyckå), magnus at thinkware.se
Thinkware AB, Sweden, www.thinkware.se
I code Python ~ The Agile Programming Language 




More information about the Tutor mailing list