get word base
John Hunter
jdhunter at nitace.bsd.uchicago.edu
Fri Jun 28 16:14:50 EDT 2002
I would like to be able to get the root/base of a word by stripping
off plurals, gerund endings, participle endings etc... Here is a
totally naive first attempt that gets it right sometimes:
import re
rgx = re.compile( '(\w+?)(?:ing|ed|es|s)')
def get_base(word):
m = rgx.match(word)
if m:
return m.group(1)
else:
return word
words = ['hello', 'taxes', 'thoughts', 'walked', 'rakes']
for word in words:
print word, get_base(word)
Produces the following output
> python get_baseword.py
hello hello
taxes tax
thoughts thought
walked walk
rakes rak
I can think of a few things to do to refine this, but before I forge
ahead, I wanted to solicit advice.
Thanks,
John Hunter
More information about the Python-list
mailing list