[Spambayes] Still getting tons of false positives...
Gregory Gulik
greg at gulik.org
Thu Apr 7 20:41:11 CEST 2005
What I did last night was construct nearly equal size mailboxes
containing spam and non-spam. Each one had about 1600 messages in it.
I was extra careful to make sure that my non-spam folder contained a
number of E-mails that might at first glance look like spam but really
aren't, such as various automated notifications. That was a very
tedious process, but then checking my spam folder several times a day
was getting tedious as well.
I then moved the old database file and retrained. I let it run that way
for about 12 hours and I'm happy to report I've only had 3 spams get
through and so far I found only one false positive.
That's much more acceptable. I'll continue to train on errors like I
have been so hopefully it can only get better from here.
Thanks!
Tony Meyer wrote:
> I would say that retraining was the best bet, yes. Wiping (or moving aside)
> the existing databases and then following a train-on-errors regime would
> probably work best (unless you want to use the tte.py script, which would
> probably provide even better results).
>
> There's lot of information about training at:
>
> http://entrian.com/sbwiki/TrainingIdeas
>
--
Greg Gulik http://www.gulik.org/greg/
greg @ gulik.org
More information about the Spambayes
mailing list