How to find all the same words in a text?
Samuel Karl Peterson
skpeterson at nospam.please.ucdavis.edu
Sun Feb 11 08:13:51 EST 2007
"Johny" <python at hope.cz> on 10 Feb 2007 05:29:23 -0800 didst step
forth and proclaim thus:
> I need to find all the same words in a text .
> What would be the best idea to do that?
I make no claims of this being the best approach:
====================
def findOccurances(a_string, word):
"""
Given a string and a word, returns a double:
[0] = count [1] = list of indexes where word occurs
"""
import re
count = 0
indexes = []
start = 0 # offset for successive passes
pattern = re.compile(r'\b%s\b' % word, re.I)
while True:
match = pattern.search(a_string)
if not match: break
count += 1;
indexes.append(match.start() + start)
start += match.end()
a_string = a_string[match.end():]
return (count, indexes)
====================
Seems to work for me. No guarantees.
--
Sam Peterson
skpeterson At nospam ucdavis.edu
"if programmers were paid to remove code instead of adding it,
software would be much better" -- unknown
More information about the Python-list
mailing list