letter frequency counter / your thoughts..

bruno.desthuilliers at gmail.com bruno.desthuilliers at gmail.com
Wed May 7 17:51:17 EDT 2008


On 7 mai, 18:39, umpsu... at gmail.com wrote:
> Hello,
>
> Here is my code for a letter frequency counter.  It seems bloated to
> me and any suggestions of what would be a better way (keep in my mind
> I'm a beginner) would be greatly appreciated..
>
> def valsort(x):
>         res = []
>         for key, value in x.items():
>                 res.append((value, key))
>         return res
>
> def mostfreq(strng):
>         dic = {}
>         for letter in strng:
>                 if letter not in dic:
>                         dic.setdefault(letter, 1)

You don't need dict.setdefault here - you could more simply use:
                           dic[letter] = 0
>                 else:
>                         dic[letter] += 1
>         newd = dic.items()
>         getvals = valsort(newd)

There's an error here, see below...

>         getvals.sort()
>         length = len(getvals)
>         return getvals[length - 3 : length]

This is a very convoluted way to get the last (most used) 3 pairs. The
shortest way is:
  return getvals[-3:]


Now... Did you actually test your code ?-)

mostfreq("abbcccddddeeeeeffffff")

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/tmp/python-1048583f.py", line 15, in mostfreq
  File "/usr/tmp/python-1048583f.py", line 3, in valsort
AttributeError: 'list' object has no attribute 'items'

Hint : you're passing valsort a list of pairs when it expects a
dict...

I don't see the point of the valsort function that  - besides it
doesn't sort anything (I can only second Arnaud about naming) -  seems
like an arbitrary extraction of a piece of code for no obvious reason,
and could be expressed in a simple single line:
  getvals = [(v, k) for k, v in dic.items()]

While we're at it, returning only the 3 most frequent items seems a
bit arbitrary too - you could either return the whole thing, or at
least let the user decide how many items he wants.

And finally, I'd personnally hope such a function to return a list of
(letter, frequency) and not a list of (frequency, letter).

Here's a (Python 2.5.x only) possible implementation:

from collections import defaultdict

def get_letters_frequency(source):
    letters_count = defaultdict(int)
    for letter in source:
        letters_count[letter] += 1
    sorted_count = sorted(
       ((freq, letter) for letter, freq in letters_count.iteritems()),
        reverse=True
        )
    return [(letter, freq) for freq, letter in sorted_count]

get_letters_frequency("abbcccddddeeeeeffffff")
=> [('f', 6), ('e', 5), ('d', 4), ('c', 3), ('b', 2), ('a', 1)]

# and if you only want the top 3:
get_letters_frequency("abbcccddddeeeeeffffff")[0:3]
=> [('f', 6), ('e', 5), ('d', 4)]

HTH



More information about the Python-list mailing list