[Spambayes] cutoff settings

Peter Bishop bishop at aeroprise.com
Tue Nov 14 19:03:11 CET 2006


The whole question of what to set the filtering parameters for Certain Spam
and Possible Spam is interesting.  I believe that the real trade-off is that
SpamBayes needs a certain amount of training.  Also, we are living in an
environment in which the generators of Spam are trying to get through
SpamBayes (among other filters) with some success.  The good news is that
SpamBayes automatically adapts to new attempts to get through it, as long as
you keep training it on the new Spam (and any time it can't tell that a real
email is real).
 
The whole point is to make the Certain Spam folder really be CERTAIN.  This
way you only need to look at it in a cursory manner in order to determine
that it really is certain.  The Possible Spam folder is really used to
identify which emails are sufficiently questionable that SpamBayes needs
further training.  Even so, having a possible Spam folder that holds 90-99%
spam is still a lot more productive that having this number of emails in
your regular inbox, because you are in "spam-detection" mode when looking at
the Possible Spam folder rather than being in "email-reading" mode as when
you look at your inbox.
 
Thus the default parameter for Possible Spam is 15% and the default
parameter for Certain Spam is 90%.  You can play with these, but you need a
significant window between these two in order to get enough emails to allow
SpamBayes to adapt to changing spam attacks.  I found it was not difficult
to get most of my good emails to have very low spam scores, so a very low
number on Possible Spam is good.
 
The best way to learn how to set these values is to display the spam scores
in Outlook.  You can add the spam score column to your outlook display of
your inbox and your possible spam folder, and your certain spam folder.
This way you can quickly assess how to set these parameters to minimize the
possibility of getting a good email into Certain Spam and getting spam in
your inbox.  Don't try to minimize the number of spams in the Possible Spam
folder, just keep the amount of spam here to a reasonably large percentage
of total spam so SpamBayes will be trained on new spam attack methods.
 
Peter Bishop 
Aeroprise, Inc. 
Take advantage of the Aeroprise Enterprise Discovery and Personalization
System for both Smart Clients and standard browsers available only with the
Aeroprise Mobile Gateway.
  _____  

On Behalf Of Matt Fischer
Subject: [Spambayes] cutoff settings


What are the default ham/spam cutoff settings?  (And where are they?)

I want to change my cut-offs so that I have less Unsure and more Spam, as I
get 10-20 Unsures per day and 99.999999% are Spam.  

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/spambayes/attachments/20061114/ec496b48/attachment.html 


More information about the SpamBayes mailing list