[Spambayes] Stemming and stopword elemination

Alexander Leidinger Alexander at Leidinger.net
Fri Jan 17 13:47:41 EST 2003


Hi,

has someone already experimented with Information Retrieval techniques
like stopword elemination (stopwords: the, a, an, or, and, ...) and word
stemming?

See http://www.tartarus.org/~martin/PorterStemmer for a description of
the algorithm for english text and a python implementation, or
http://snowball.tartarus.org/ for non-english stemmers.

I don't think this will change the failure rate significantly (maybe
better results with few training data, maybe worser; I don't expect
much change with large training data), but it should reduce the size of
the needed database.

Bye,
Alexander.

-- 
               I believe the technical term is "Oops!"

http://www.Leidinger.net                       Alexander @ Leidinger.net
  GPG fingerprint = C518 BC70 E67F 143F BE91  3365 79E2 9C60 B006 3FE7



More information about the Spambayes mailing list