[Tutor] Counting & Sorting Instances In File

Jeff Shannon jeff@ccvcorp.com
Wed May 14 21:58:02 2003


Michael Barrett wrote:

>    The hard (?) part is the sorting.  Thats the part I need help with, so assume a dictionary of 'Email': count values.  If you can think of a better way of storing the data in memory for my sort, that'd be appreciated as well.  Thanks again.
>

Doing your counting *is* best done with a dictionary, especially if you 
make use of the get() method --

for line in logfile:
    email = parse_email_addr(line)
    emailcount[email] = emailcount.get(email, 0) + 1

Once you have that dictionary of email:count values, you can convert 
that to a list and use the list's built-in sort() method.  When sorting 
a list where each element is another list (or a tuple), the first 
element of the nested list is used to sort on, so the best thing to do 
is to ensure that that first element is your count.

# first, use a list comprehension to create a list of (count, email) pairs
email_list = [ (value, key) for key, value in emailcount.items() ]
# now sort the list
email_list.sort()
# to sort from highest count to lowest count, reverse the list
email_list.reverse()
# now print the first 25
for count, email in email_list[:25]:
    print "%25s     %d"  % (email, count)

If there's any of this that doesn't make sense, I can explain in a 
little more detail...

Jeff Shannon
Technician/Programmer
Credit International