Pythonic Porter stemmers (Was: Re: Word frequencies -- Python or Perl for performance?)

Tim Churches tchur at optushome.com.au
Sun Mar 17 03:11:34 EST 2002


Bengt Richter wrote:
> 
> On Sat, 16 Mar 2002 09:50:32 +1100, Tim Churches <tchur at optushome.com.au> wrote:
> [...]
> >
> >I concur with Bengt's suggested approach, plus you might want to use
> >something like the Porter Stemmer algorithm to convert words to their
> >"base" forms eg stepped -> step
> >
> >See http://www.tartarus.org/~martin/PorterStemmer/python.txt for a
> >Python implementation.
> >
> Any idea on the license status of that? I saw nothing mentioned
> in the text itself except and apparent reference to a book.

No idea re that code, but the Porter stemming algorithm was described in
the computer science literature about 20 years ago, and AFAIK, the
algorithm is not encumbered by any patents (but you probably should do a
patent search anyway). Also, there are a number of other algorithms
which have been developed since, but I have no idea whether any of them
are generally better, or better for particular purposes. To my lay eyes,
computational linguistics seems like a fascinating but sprawling field
of endeavour and one really needs an expert guide to find one's way
around.

> 
> E.g., if I rewrote it to my taste, could I put the result under PSF
> if I wanted to?

It you worked from a description of the Porter stemmer algorithm rather
than the existing code, you would be free to license the result any way
you please.

Now, I have just discovered another Python implementation of the Porter
stemmer sitting on my hard disk, but I have no idea where I downloaded
it from, and alas, there is no clue to its source or the licensing in
the file, so I sincerely hope I am not violating the terms under which I
originally downloaded it by placing a copy at
http://gestalt-system.sourceforge.net/Porter.py in case anyone is
interested in it or can identify its provenance.

Cheers,

Tim C




More information about the Python-list mailing list