[Spambayes] lots of unsures, heavily biased towards spam

David Abrahams dave at boost-consulting.com
Sun Feb 4 17:33:37 CET 2007


skip at pobox.com writes:

>     >> > If your training set has much more spam than ham, you can train on
>     >> > ham that already scores properly.
>     >> 
>     >> That'll help?  Great; it's easy enough.
>
>     Seth> There is anecdotal evidence that this helps, as well a few systems
>     Seth> where it doesn't seem to matter.  If Spambayes is not classifying
>     Seth> well enough, this is a good thing to try.
>
> If there's any possibility you've made a training mistake (training ham as
> spam or vice versa), 

Of course there's always the possibility, but...

> I'd just empty out your training database and start
> from scratch.  

I did that for the spam training folder a few days ago, reviewed
all the training ham to make sure it was legit, then regenerated my
database.  Also I re-reviewed my training ham last night.  So I think
I'm in pretty good shape from that perspective.

I've done the /whole/ process (both ham and spam) from scratch several
times in the past; it doesn't cause _too_ much disruption and I'm
willing to do it again if necessary.  However, I'd rather better
understand what's going on right now and how to fix it, since I'm sure
to find myself in this situation again.

> If the interface you're using allows you to delete trained mails you
> could also try deleting a bunch of old mails you classified as spam.

It does, but I have to confess I don't really understand the
implications of doing so.  My setup is as follows:

- imap
- ham-training and spam-training folders
- server-side sb_imapfilter trains hourly on the content of these folders

I know spambayes keeps a database; when I delete already-trained
emails from my xxx-training folders does it forget everything about
those messages and rebuild the database using the other messages as
though from scratch, or is some of the information about those deleted
messages retained?

Thanks,

-- 
Dave Abrahams
Boost Consulting
www.boost-consulting.com


More information about the SpamBayes mailing list