letter frequency counter / your thoughts..
bruno.desthuilliers at gmail.com
bruno.desthuilliers at gmail.com
Wed May 7 17:51:17 EDT 2008
On 7 mai, 18:39, umpsu... at gmail.com wrote:
> Hello,
>
> Here is my code for a letter frequency counter. It seems bloated to
> me and any suggestions of what would be a better way (keep in my mind
> I'm a beginner) would be greatly appreciated..
>
> def valsort(x):
> res = []
> for key, value in x.items():
> res.append((value, key))
> return res
>
> def mostfreq(strng):
> dic = {}
> for letter in strng:
> if letter not in dic:
> dic.setdefault(letter, 1)
You don't need dict.setdefault here - you could more simply use:
dic[letter] = 0
> else:
> dic[letter] += 1
> newd = dic.items()
> getvals = valsort(newd)
There's an error here, see below...
> getvals.sort()
> length = len(getvals)
> return getvals[length - 3 : length]
This is a very convoluted way to get the last (most used) 3 pairs. The
shortest way is:
return getvals[-3:]
Now... Did you actually test your code ?-)
mostfreq("abbcccddddeeeeeffffff")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/tmp/python-1048583f.py", line 15, in mostfreq
File "/usr/tmp/python-1048583f.py", line 3, in valsort
AttributeError: 'list' object has no attribute 'items'
Hint : you're passing valsort a list of pairs when it expects a
dict...
I don't see the point of the valsort function that - besides it
doesn't sort anything (I can only second Arnaud about naming) - seems
like an arbitrary extraction of a piece of code for no obvious reason,
and could be expressed in a simple single line:
getvals = [(v, k) for k, v in dic.items()]
While we're at it, returning only the 3 most frequent items seems a
bit arbitrary too - you could either return the whole thing, or at
least let the user decide how many items he wants.
And finally, I'd personnally hope such a function to return a list of
(letter, frequency) and not a list of (frequency, letter).
Here's a (Python 2.5.x only) possible implementation:
from collections import defaultdict
def get_letters_frequency(source):
letters_count = defaultdict(int)
for letter in source:
letters_count[letter] += 1
sorted_count = sorted(
((freq, letter) for letter, freq in letters_count.iteritems()),
reverse=True
)
return [(letter, freq) for freq, letter in sorted_count]
get_letters_frequency("abbcccddddeeeeeffffff")
=> [('f', 6), ('e', 5), ('d', 4), ('c', 3), ('b', 2), ('a', 1)]
# and if you only want the top 3:
get_letters_frequency("abbcccddddeeeeeffffff")[0:3]
=> [('f', 6), ('e', 5), ('d', 4)]
HTH
More information about the Python-list
mailing list