How to find all the same words in a text?

Maël Benjamin Mettler mbm at mediamonger.ch
Sun Feb 11 08:47:31 EST 2007


In order to find all the words in a text, you need to tokenize it first.
The rest is a matter of calling the count method on the list of
tokenized words. For tokenization look here:
http://nltk.sourceforge.net/lite/doc/en/words.html
A little bit of warning: depending on what exactly you need to do, the
seemingly trivial taks of tokenizing a text can become quite complex.

Enjoy,

Maël

Neil Cerutti schrieb:
> On 2007-02-10, Johny <python at hope.cz> wrote:
>> I need to find all the same words in a text .
>> What would be the best idea  to do that?
>> I used string.find but it does not work properly for the words.
>> Let suppose I want to find a number 324 in the  text
>>
>> '45  324 45324'
>>
>> there is only one occurrence  of 324 word but string.find()   finds 2
>> occurrences  ( in 45324 too)
>>
>> Must I use regex?
>> Thanks for help
> 
> The first thing to do is to answer the question: What is a word?
> 
> The second thing to do is to design some code that can find
> words in strings.
> 
> The last thing to do is to search those actual words for the word
> you're looking for.
> 




More information about the Python-list mailing list