[Spambayes] SpamBayes compared to POPFile

Tim Peters tim.one at comcast.net
Sun Mar 14 01:35:23 EST 2004


[Ian Dalton]
> Is SpamBayes better than POPFile?

They're both free -- try them and see which works better for you.

> If so, what makes it so? I hear why yours is better than others
> based on Paul Graham's work, but POPFile claims not to be based
> on that.

POPFile is a classical N-way "naive Bayesian classifier", and the detailed
theory for doing that is more than 40 years old.  Paul Graham didn't invent
it, but did popularize (a simplified version of) it for spam identification.
SpamBayes is a 2-way classifier (not N-way), with a third "Unsure" category
for messages it's not confident about, and is Bayesian in a different way
than either POPFile or what Paul Graham wrote about.

All the details of what SpamBayes does were driven by statistical testing,
performed by multiple people on multiple real-life email mixes.  Amazingly
enough, no detail of Graham's original scheme (which we started with)
survived that process.  In the end, this project owes much more to Gary
Robinson's ideas.

It's rare that of two statistical inference algorithms, one always beats the
other.  That's why testing on multiple email mixes is important, and also
why-- if you care about getting the best possible results --you have to try
the available alternatives in real life, on your own input.  Note that if
you want N-way classification, spamabyes isn't a contender.




More information about the Spambayes mailing list