[Tutor] Counting number of occurrence of each character in a string

Mats Wichmann mats at wichmann.us
Wed Sep 2 10:02:50 EDT 2020


On 9/2/20 4:19 AM, Manprit Singh wrote:
> Dear sir ,
> consider a problem of Counting number of occurrence of each  character in a
> string
> x = "ABRACADABRA"
> in this string x :
> Number of occurrence of   character A = 5
> Number of occurrence of   character B = 2
> Number of occurrence of   character R = 2
> Number of occurrence of   character C = 1
> Number of occurrence of   character D = 5
> 
> The code will be written like this :
> 
>>>> a = "ABRACADABRA"
>>>> d = {}
>>>> for i in a:
>             d[i] = d.get(i, 0) + 1
> 
> will result in
>>>> d
> {'A': 5, 'B': 2, 'R': 2, 'C': 1, 'D': 1}
> 
> Here keys are characters in the string a, and the values are the counts.


You've done a good job identifying a datatype that's good for counting
(a dictionary) and a problem, which is that you have two "states" - key
is not in dict yet, and key is already in dict, and using the get()
method to not fail in the former state when incrementing.  Leave out
using the update: that's a useful method if want to update one dict with
information from another, but not ideal one-at-a-time in a loop like this.



Of course this scenario comes up all the time, so Python provides some
help with this. Don't worry if you find this "advanced"...

There's a variant of the dict, called a defaultdict, which supplies a
default value of the type you give it in case the key is not in the
dict. Here's that in action:

>>> import collections
>>> d = collections.defaultdict(int)
>>> d['A']
0
>>> for i in a:
...     d[i] += 1
...
>>> print(d)
defaultdict(<class 'int'>, {'A': 5, 'B': 2, 'R': 2, 'C': 1, 'D': 1})
>>>

The "int" argument to defaultdict is a "factory function" which is
called to make an object for you when the default value is needed. Using
the int() function without argument makes you an empty/false one, which
for an integer is 0:

>>> print(int())
0


It turns out that the case of counting is so common there's another
"collections" class which just does it all for you (you might have
noticed this in Peter's timings):

>>> d = collections.Counter(a)
>>> print(d)
Counter({'A': 5, 'B': 2, 'R': 2, 'C': 1, 'D': 1})
>>>

that's useful to know, but probably isn't useful in working a problem
that wants you to learn about using the basic datatypes, as one suspects
this was.


More information about the Tutor mailing list