[Spambayes] question regarding training

Tony Meyer tameyer at ihug.co.nz
Tue Aug 10 08:32:00 CEST 2004


> I have noticed that on my Spambayes manager, it
> has way more spam than ham.  It also states that it
> works best when there are equal amounts of both.
> What can I do to make it work more efficiently?

This is getting to be a FAQ!

Firstly, if you are not already, then doing "train on mistakes" is a good
idea.  Basically, the only training you do is on mail that ends up in the
'unsure' folder, and any false positives (good mail in spam folder) and
false negatives (vice versa), if there are any.  This should reduce the
imbalance, and make it grow less quickly.

If you get a lot of mail in the 'unsure' folder, you can adjust the
thresholds (Filtering tab), to try and reduce it.

If you get multiple copies of a spam message, don't "Delete as spam" all of
them, just one, and move the rest to the spam folder (or Deleted Items)
manually.

Don't worry too much about the imbalance as long as things are working well
enough.  Particularly if it's a small imbalance (like 3::1) rather than a
large one (like 100:1).

(Longer term, the developers are trying to figure out ways to help people
with this problem, but that's a way off yet).

=Tony Meyer

---
Please always include the list (spambayes at python.org) in your replies
(reply-all), and please don't send me personal mail about SpamBayes. This
way, you get everyone's help, and avoid a lack of replies when I'm busy.



More information about the Spambayes mailing list