How to find all the same words in a text?
Maël Benjamin Mettler
mbm at mediamonger.ch
Sun Feb 11 08:47:31 EST 2007
In order to find all the words in a text, you need to tokenize it first.
The rest is a matter of calling the count method on the list of
tokenized words. For tokenization look here:
http://nltk.sourceforge.net/lite/doc/en/words.html
A little bit of warning: depending on what exactly you need to do, the
seemingly trivial taks of tokenizing a text can become quite complex.
Enjoy,
Maël
Neil Cerutti schrieb:
> On 2007-02-10, Johny <python at hope.cz> wrote:
>> I need to find all the same words in a text .
>> What would be the best idea to do that?
>> I used string.find but it does not work properly for the words.
>> Let suppose I want to find a number 324 in the text
>>
>> '45 324 45324'
>>
>> there is only one occurrence of 324 word but string.find() finds 2
>> occurrences ( in 45324 too)
>>
>> Must I use regex?
>> Thanks for help
>
> The first thing to do is to answer the question: What is a word?
>
> The second thing to do is to design some code that can find
> words in strings.
>
> The last thing to do is to search those actual words for the word
> you're looking for.
>
More information about the Python-list
mailing list