get word base

John Hunter jdhunter at nitace.bsd.uchicago.edu
Fri Jun 28 16:14:50 EDT 2002


I would like to be able to get the root/base of a word by stripping
off plurals, gerund endings, participle endings etc...  Here is a
totally naive first attempt that gets it right sometimes:

import re

rgx = re.compile( '(\w+?)(?:ing|ed|es|s)')

def get_base(word):

    m = rgx.match(word)
    if m:
        return m.group(1)
    else:
        return word

words = ['hello', 'taxes', 'thoughts', 'walked', 'rakes']

for word in words:
    print word, get_base(word)

Produces the following output
> python get_baseword.py
hello hello
taxes tax
thoughts thought
walked walk
rakes rak


I can think of a few things to do to refine this, but before I forge
ahead, I wanted to solicit advice.

Thanks,
John Hunter



More information about the Python-list mailing list