How to find all the same words in a text?

attn.steven.kuo at gmail.com attn.steven.kuo at gmail.com
Sun Feb 11 11:16:11 EST 2007


On Feb 11, 5:13 am, Samuel Karl Peterson
<skpeter... at nospam.please.ucdavis.edu> wrote:
> "Johny" <pyt... at hope.cz> on 10 Feb 2007 05:29:23 -0800 didst step
> forth and proclaim thus:
>
> > I need to find all the same words in a text .
> > What would be the best idea  to do that?
>
> I make no claims of this being the best approach:
>
> ====================
> def findOccurances(a_string, word):
>     """
>     Given a string and a word, returns a double:
>     [0] = count [1] = list of indexes where word occurs
>     """
>     import re
>     count = 0
>     indexes = []
>     start = 0     # offset for successive passes
>     pattern = re.compile(r'\b%s\b' % word, re.I)
>
>     while True:
>         match = pattern.search(a_string)
>         if not match: break
>         count += 1;
>         indexes.append(match.start() + start)
>         start += match.end()
>         a_string = a_string[match.end():]
>
>     return (count, indexes)
> ====================
>
> Seems to work for me.  No guarantees.
>



More concisely:

import re

pattern = re.compile(r'\b324\b')
indices = [ match.start() for match in
pattern.finditer(target_string) ]
print "Indices", indices
print "Count: ", len(indices)

--
Cheers,
Steven




More information about the Python-list mailing list