(Newbie) Counting Instances ("Hits") with Regular Expressions

Sun Jun 23 12:09:15 EDT 2002

Ben Fairbank wrote:
> 
> I am new both to Python and to regular expressions, which may account
> for my difficulty.  I must count the frequenies of certain words in
> files of moderate length (about150 k bytes).  I have been reading
> files and then using count(s,sub), which is fast and easy.  I now have
> to allow for punctuation and eliminate words within words, etc, and so
> am trying to use regular expressions instead of simple words as
> targets.  I do not, however, find a similarly easy to use count
> function in the re module.  Yet this is such common operation it must
> be there, or easy to implement.  What is the usual way of simply
> counting "hits" in the re module?  (And what have I missed in the
> documentation; where is this to be found?  I have looked through Lutz
> and Ascher)

I don't think this sort of thing is really so common as you believe.
I'm also not certain using the re module is the typical way to handle
something like this.  I suspect writing some kind of parser/tokenizer
is more common, when you talk about having to take punctuation and
such into account.

In any case, if you find re's are suitable for you, you can easily
count the number found by using "findall" from the re module and
len() to find the count.

-Peter