get word base
Bengt Richter
bokr at oz.net
Fri Jun 28 17:12:58 EDT 2002
On Fri, 28 Jun 2002 15:14:50 -0500, John Hunter <jdhunter at nitace.bsd.uchicago.edu> wrote:
>
>I would like to be able to get the root/base of a word by stripping
>off plurals, gerund endings, participle endings etc... Here is a
>totally naive first attempt that gets it right sometimes:
>
>import re
>
>rgx = re.compile( '(\w+?)(?:ing|ed|es|s)')
>
>def get_base(word):
>
> m = rgx.match(word)
> if m:
> return m.group(1)
> else:
> return word
>
>words = ['hello', 'taxes', 'thoughts', 'walked', 'rakes']
>
>for word in words:
> print word, get_base(word)
>
>Produces the following output
>> python get_baseword.py
>hello hello
>taxes tax
>thoughts thought
>walked walk
>rakes rak
>
>
>I can think of a few things to do to refine this, but before I forge
>ahead, I wanted to solicit advice.
>
Google for python stemmer ;-)
Regards,
Bengt Richter
More information about the Python-list
mailing list