[Tutor] Sorting a dictionary on a value in a list.

Thu Dec 4 02:42:47 CET 2008

On Wed, Dec 3, 2008 at 7:58 PM, Lawrence Wickline
<lawrence.wickline at gmail.com> wrote:

> how would I sort on bytes sent?

You can't actually sort a dictionary; what you can do is sort the list of items.

In this case each item will look be a tuple
  (filename, (bytes, bytes_sent))
and dict.items() will be a list of such tuples.

The best way to sort a list is to make a key function that extracts a
key from a list item, then pass that to the list sort() method. In
your case, you want to extract the second element of the second
element, so you could use the function
def make_key(item):
  return item[1][1]

Then you can make a sorted list with
sorted(dict.items(), key=make_key)

> how would I make this more efficient?

It looks pretty good to me. A few minor notes below.

> code:
>
> # Expect as input:
> #      URI,1,return_code,bytes,referer,ip,time_taken,bytes_sent,ref_dom
> # index 0  1       2       3      4    5      6           7        8
>
> import sys
>
>
> dict = {}

Don't use dict as the name of a variable, it shadows the built-in
dict() function.

> def update_dict(filename, bytes, bytes_sent):
>    # Build and update our dictionary adding total bytes sent.
>    if dict.has_key(filename):
>        bytes_sent += dict[filename][1]
>        dict[filename] = [bytes, bytes_sent]
>    else:
>        dict[filename] = [bytes, bytes_sent]

If you really want to squeeze every bit of speed,
  filename in dict
is probably faster than
  dict.has_key(filename)
and you might try also using a try / catch block instead of has_key().
You could also try passing dict as a parameter, that might be faster
than having it as a global.

None of these will matter unless you have many thousand lines of
input. How many lines do you have? How long does it take to process?

> # input comes from STDIN
> for line in sys.stdin:
>    # remove leading and trailing whitespace and split on tab
>    words = line.rstrip().split('\t')

rstrip() removes only trailing white space. It is not needed since you
don't use the last field anyway.

>    file = words[0]
>    bytes = words[3]
>    bytes_sent = int(words[7])
>    update_dict(file, bytes, bytes_sent)

If you put all this into a function it will run a little faster.

Kent