[Tutor] help for dictionary sorting program
Jeff Shannon
jeff@ccvcorp.com
Fri Apr 18 13:22:15 2003
Abdirizak abdi wrote:
> Hi,
> I was working on a function that takes a adictionary as an argument
> and returns the frequency ordering of elements in the dictionary. I
> have dictionary set up as follows:
>
> Value key frequency positions where the word was found
> natural doc1.txt (89, (17, 22, 26))
This doesn't really show how your dictionary is defined. You say that
'doc1.txt' is the key, but I'd presume that you'll have many words from
doc1.txt, and you'll want a dictionary listing all of them. That
indicates that you want to key the dictionary off of the word ('natural'
in this case) instead of the filename. I'd probably implement the
dictionary something like this:
freq_dict = { 'natural' : ('doc1.txt', 89, [17,22,26]), ...]
Each item in the dictionary is keyed off of the word, and the value is a
tuple containing the filename, the frequency number within that file,
and a list of the word positions. If you want to be able to track word
frequency in multiple files, you could fairly easily extend this to be a
list of such tuples, one for each file, or perhaps even a nested dict
keyed off of the filename.
> my intention is to pick-up the items in the dictionary by frequency [...]
So you want to get a list of all of the key:value pairs, sorted by
value[1] ...
> def SortDict(adict):
> """ this function sorts the frequency/key pairs in the
> dictionary by highest to lowest frequency """
> counts = {}
> for word in adict.items():
> if counts.has_key(word):
> counts[word] += 1
> else:
I'm guessing that you didn't include the actual else: clause in order to
save space, since having nothing following the else: will result in a
syntax error. The problem here is that you're asking for adict.items(),
which returns a list of [key, value] pairs, so that word is actually a
2-element list. It looks like you probably want to use adict.keys()
instead, which will give you a list of only the keys in adict. You
could also write it as
for word, value in adict.items():
and take advantage of automatic tuple unpacking. Now, given the example
dictionary I showed above, word will be 'natural' and value will be
"('doc1.txt', 89, [17,22,26])".
However, I'm not sure what you're trying to accomplish here. It looks
like you're trying to get a count of the number of times each word
occurs in adict. But because dictionary keys must be unique, you're
guaranteed that each word in adict occurs only once.
If you're trying to sort adict by the frequency of words, remember that
you've already got that frequency as part of the value. You just need
to reorganize things so that you can use the built-in list sort().
Remember what I said about needing to sort on value[1]? We just need
to create a list where value[1] is the first part of each element. If
we're iterating through adict.items(), then value is the second part of
each item (item[1]). So we build a list that contains (freq, item),
sort that list, then discard freq and keep item. (This procedure is
frequently called decorate-sort-undecorate.)
def sort_by_freq(adict):
templist = []
for item in adict.items():
element = (item[1][1], item)
templist.append( element )
templist.sort()
result = []
for element in templist:
result.append( element[1] )
return result
We can do this a bit more compactly by using list comprehensions:
def sort_by_freq(adict):
templist = [ (item[1][1], item) for item in adict.items() ]
templist.sort()
result = [element[1] for element in templist]
return result
The result that's returned by this will now be a list of (key, value)
pairs, where value is the tuple I described above, with frequency as its
middle element. You can then print the first five elements:
sorted_list = sort_by_freq(freq_dict)
for line in sorted_list[:5]:
print "word: %10s freq: %4d" % (line[0], line[1][1])
Hope this helps...
Jeff Shannon
Technician/Programmer
Credit International