How to find all the same words in a text?
attn.steven.kuo at gmail.com
attn.steven.kuo at gmail.com
Sun Feb 11 11:16:11 EST 2007
On Feb 11, 5:13 am, Samuel Karl Peterson
<skpeter... at nospam.please.ucdavis.edu> wrote:
> "Johny" <pyt... at hope.cz> on 10 Feb 2007 05:29:23 -0800 didst step
> forth and proclaim thus:
>
> > I need to find all the same words in a text .
> > What would be the best idea to do that?
>
> I make no claims of this being the best approach:
>
> ====================
> def findOccurances(a_string, word):
> """
> Given a string and a word, returns a double:
> [0] = count [1] = list of indexes where word occurs
> """
> import re
> count = 0
> indexes = []
> start = 0 # offset for successive passes
> pattern = re.compile(r'\b%s\b' % word, re.I)
>
> while True:
> match = pattern.search(a_string)
> if not match: break
> count += 1;
> indexes.append(match.start() + start)
> start += match.end()
> a_string = a_string[match.end():]
>
> return (count, indexes)
> ====================
>
> Seems to work for me. No guarantees.
>
More concisely:
import re
pattern = re.compile(r'\b324\b')
indices = [ match.start() for match in
pattern.finditer(target_string) ]
print "Indices", indices
print "Count: ", len(indices)
--
Cheers,
Steven
More information about the Python-list
mailing list