Counting occurences of words in a list of strings

Travers Naran tnaran at no-more-virii-please.direct.ca
Tue May 24 22:37:44 EDT 2005


Here's the basic idea.  I have a dictionary of substrings (the substrings 
stored as keys).  I have a list of strings.  I want to find out, for each 
word in the dictionary, how many times the substring occurs non-overlapping 
occurrences there are in the list of strings.  This is the best I could 
come up with:

         for j in self.dict.keys():
             c = 0
             for i in tc.strings:
                 c += i.count(j)
             self.dict[j]['count'] += c

I tried Boyer-Moore-Horspool as someone else suggested in the group, but it 
was even slower (probably because of the time spent setting up the 
pattern).  Even when I split the algorithm in two, doing the set-up before 
the inner loop and only testing the pattern in the inner loop, it was still 
slow.

So, am I doing this the optimal way or am I overlooking something _really_ 
simple to do?

Statistics on the data:
The dictionary has 3000+ entries in it.
There are several thousand strings.
The strings average 10-26 characters in length.



More information about the Python-list mailing list