How to find all the same words in a text?
Thorsten Kampe
thorsten at thorstenkampe.de
Sat Feb 10 09:35:37 EST 2007
* Johny (10 Feb 2007 05:29:23 -0800)
> I need to find all the same words in a text .
> What would be the best idea to do that?
> I used string.find but it does not work properly for the words.
> Let suppose I want to find a number 324 in the text
>
> '45 324 45324'
>
> there is only one occurrence of 324 word but string.find() finds 2
> occurrences ( in 45324 too)
>
> Must I use regex?
There are two approaches: one is the "solve once and forget" approach
where you code around this particular problem. Mario showed you one
solution for this.
The other approach would be to realise that your problem is a specific
case of two general problems: partitioning a sequence by a separator
and partioning a sequence into equivalence classes. The bonus for this
approach is that you will have a /lot/ of problems that can be solved
with either one of these utils or a combination of them.
1>>> a = '45 324 45324'
2>>> quotient_set(part(a, [' ', ' '], 'sep'), ident)
2: {'324': ['324'], '45': ['45'], '45324': ['45324']}
The latter approach is much more flexible. Just imagine your problem
changes to a string that's separated by newlines (instead of spaces)
and you want to find words that start with the same character (instead
of being the same as criterion).
Thorsten
More information about the Python-list
mailing list