[Spambayes] Still getting tons of false positives...

Gregory Gulik greg at gulik.org
Thu Apr 7 20:41:11 CEST 2005

What I did last night was construct nearly equal size mailboxes 
containing spam and non-spam.  Each one had about 1600 messages in it.

I was extra careful to make sure that my non-spam folder contained a 
number of E-mails that might at first glance look like spam but really 
aren't, such as various automated notifications.  That was a very 
tedious process, but then checking my spam folder several times a day 
was getting tedious as well.

I then moved the old database file and retrained.  I let it run that way 
for about 12 hours and I'm happy to report I've only had 3 spams get 
through and so far I found only one false positive.

That's much more acceptable.  I'll continue to train on errors like I 
have been so hopefully it can only get better from here.


Tony Meyer wrote:
> I would say that retraining was the best bet, yes.  Wiping (or moving aside)
> the existing databases and then following a train-on-errors regime would
> probably work best (unless you want to use the tte.py script, which would
> probably provide even better results).
> There's lot of information about training at:
>   http://entrian.com/sbwiki/TrainingIdeas

Greg Gulik                                 http://www.gulik.org/greg/
greg @ gulik.org

More information about the Spambayes mailing list