How to find all the same words in a text?

Samuel Karl Peterson skpeterson at nospam.please.ucdavis.edu
Sun Feb 11 08:13:51 EST 2007


"Johny" <python at hope.cz> on 10 Feb 2007 05:29:23 -0800 didst step
forth and proclaim thus:

> I need to find all the same words in a text .
> What would be the best idea  to do that?

I make no claims of this being the best approach:

====================
def findOccurances(a_string, word):
    """
    Given a string and a word, returns a double:
    [0] = count [1] = list of indexes where word occurs
    """
    import re
    count = 0
    indexes = []
    start = 0     # offset for successive passes
    pattern = re.compile(r'\b%s\b' % word, re.I)

    while True:
        match = pattern.search(a_string)
        if not match: break
        count += 1;
        indexes.append(match.start() + start)
        start += match.end()
        a_string = a_string[match.end():]

    return (count, indexes)
====================

Seems to work for me.  No guarantees.

-- 
Sam Peterson
skpeterson At nospam ucdavis.edu
"if programmers were paid to remove code instead of adding it,
software would be much better" -- unknown



More information about the Python-list mailing list