How to find all the same words in a text?

Thorsten Kampe thorsten at thorstenkampe.de
Sat Feb 10 09:35:37 EST 2007


* Johny (10 Feb 2007 05:29:23 -0800)
> I need to find all the same words in a text .
> What would be the best idea  to do that?
> I used string.find but it does not work properly for the words.
> Let suppose I want to find a number 324 in the  text
> 
> '45  324 45324'
> 
> there is only one occurrence  of 324 word but string.find()   finds 2
> occurrences  ( in 45324 too)
> 
> Must I use regex?

There are two approaches: one is the "solve once and forget" approach 
where you code around this particular problem. Mario showed you one 
solution for this.

The other approach would be to realise that your problem is a specific 
case of two general problems: partitioning a sequence by a separator 
and partioning a sequence into equivalence classes. The bonus for this 
approach is that you will have a /lot/ of problems that can be solved 
with either one of these utils or a combination of them.

1>>> a = '45  324 45324'
2>>> quotient_set(part(a, [' ', '  '], 'sep'), ident)
2:   {'324': ['324'], '45': ['45'], '45324': ['45324']}

The latter approach is much more flexible. Just imagine your problem 
changes to a string that's separated by newlines (instead of spaces) 
and you want to find words that start with the same character (instead 
of being the same as criterion).


Thorsten



More information about the Python-list mailing list