[spambayes-dev] correlated clues

Kenny Pitt kennypitt at hotmail.com
Fri Jul 2 09:48:19 EDT 2004


Tim Peters wrote:
> Maybe another pure but personalized hack would be to add a list of
> specific tokens you want the classifier to pretend didn't exist.

POPfile does exactly this.  It has a default ignore list that includes
common words that often appear in all types of messages (I won't say ham and
spam since POPfile is a generalized, multi-bucket classifier), and it also
uses the list to remove many of the common HTML tags.  The user can then add
and remove words to personalize the list.

As an unrelated aside, the latest version of POPfile has switched from
BerkeleyDB to SQLite for its default database because of the reliability
problems with Berkeley.  Anyone have any experience with SQLite?  Would it
be worth implementing a SpamBayes storage option for it to test it out?

-- 
Kenny Pitt




More information about the spambayes-dev mailing list