Extracting words from a string : *fast*
Doug Fort
dougfort at downright.com
Tue Jun 19 13:57:46 EDT 2001
Thomas Weholt wrote:
> Hi,
>
> I need to extract words from a string. This method will be used extensivly
> in a indexer so it needs to be as fast as possible.
>
> It needs to split words by case, numbers, spaces and chars like ,.-_/\*'
> etc. Returns a list of lower-case entries of the words found or a
> dictionary of were the words are keys and number of occurences are values.
>
> Ex.
>
> s = 'This is a.test for ThomasWeholt - magic42'
> print getWords(s)
> -----------------------------------------------------
> ['this','is','a','test','for','thomas','weholt','magic','magic42']
>
> The text to be processed are mostly small in size but can also be huge,
> etc. 1-10MB.
>
> Thomas
>
>
>
Have you seen Dr. David Mertz's article on developing a Python indexer? He
addresses the problem of identifying words, check out
http://gnosis.cx/publish/programming/charming_python_15.txt
--
Doug Fort <dougfort at downright.com>
Senior Meat Manager
Downright Software LLC
http://www.downright.com
______________________________________________________________________
Posted Via Uncensored-News.Com - Still Only $9.95 - http://www.uncensored-news.com
With Seven Servers In California And Texas - The Worlds Uncensored News Source
More information about the Python-list
mailing list