[Spambayes] Training Spambayes

Kenny Pitt kennypitt at hotmail.com
Fri Dec 17 18:21:18 CET 2004


SpamBayes moves messages as it "classifies" them to remove spam from your
Inbox. If you manually move a message into the Junk Mail folder, it is
assumed to be because SpamBayes got the classification wrong, so the message
is trained so that hopefully SpamBayes will get it right the next time.

What you are asking about is a training strategy that we refer to as
"train-on-everything". It is not supported in the Outlook addin, but it is
generally not recommended for most users either. One reason is that your
training data will quickly become imbalanced if you receive more spam than
ham (or vice versa, although that seems rare these days <0.5 wink>). Another
reason is that you have to be very diligent about checking for false
positives. If one good message is incorrectly classified as spam and
automatically trained as such, it can negatively affect SpamBayes's ability
to properly identify other good messages later.

The "train-on-mistakes-and-unsures" strategy implemented in the Outlook
addin is believed to be the most effective strategy for most general users.
This basically means that you train any message that SpamBayes didn't assign
to the correct spam or good classification. It keeps your training database
as small as possible, which means that SpamBayes will generally run a little
faster and that it will be more agile to adapt to changes in your e-mail
patterns.

You can read lots more about the various training strategies on the
SpamBayes wiki:
http://entrian.com/sbwiki/TrainingIdeas

-- 
Kenny Pitt


> _____________________________________________ 
> From: 	spambayes-bounces at python.org
> [mailto:spambayes-bounces at python.org]  On Behalf Of Howard A. Mergler
> Sent:	Friday, December 17, 2004 11:39 AM
> To:	spambayes at python.org
> Subject:	[Spambayes] Training Spambayes
> 
> I guess this might be a request for a feature as opposed to a question,
> but I'll put it out there anyways. When mail is downloaded into Outlook
> and Spambayes evaluates it, why doesn't it consider that to be training or
> is there a way to tell it to train as it evaluates? What I'm getting at is
> that if you move e-mail from the Inbox to the Junk Mail folder, it
> considers that to be training, yet if Spambayes does that (because it has
> evaluated an e-mail to be junk mail) it does not seem to learn from that.
> Wouldn't Spambayes become effective quicker if it trained itself on any
> message it evaluates as well?
> 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: winmail.dat
Type: application/ms-tnef
Size: 3054 bytes
Desc: not available
Url : http://mail.python.org/pipermail/spambayes/attachments/20041217/e84ccd9d/winmail.bin


More information about the Spambayes mailing list