[Spambayes] RE: Routine training on correctly classified email?

Robert K. Coe bob at 1776.com
Sun Dec 7 18:14:16 EST 2003


The problem with mistake-based training is that almost all mistakes are false negatives. And most of the messages that go to the "Indefinite" folder turn out to be spam. The result is that over time, the database becomes increasingly spam-heavy. This in turn degrades the reliability of the algorithm, according to the accepted wisdom. Obviously this doesn't constitute "definitive proof" that automatic training would be better. But it does argue for giving it a try.

Bob

MIS Department, City of Cambridge
831 Massachusetts Ave, Cambridge MA 02139  ·  617-349-4217  ·  fax 617-349-6165


> -----Original Message-----
> From: Kenny Pitt [mailto:kennypitt at hotmail.com]
> Sent: Friday, December 05, 2003 3:55 PM
> To: 'Eamon Egan'; spambayes at python.org
> Subject: RE: [Spambayes] Routine training on correctly 
> classified email?
> 
> 
> The Unix SpamBayes filter has an all-or-nothing option to train on all
> messages that are classified as certain ham or certain spam, but this
> is not currently supported for Outlook.
> ...
> 
> So far, we have no definitive proof that automatic training is any
> better or worse than mistake-based training.  I'm sure it depends a
> lot on your particular mix of ham and spam.  There is still a lot of
> work to be done in determining if there is a "best" method of training.




More information about the Spambayes mailing list