[Spambayes] Question about the ratio of Spam to Ham you should train on...

Richie Hindle richie at entrian.com
Sun Oct 3 14:18:35 CEST 2004


[Andrew]
> I know you're supposed to train Spambayes on a roughly equal
> amount of Spam and Ham.  Does that mean you should try to train on one
> new Ham for every Spam you train, even if all your Ham is already being
> correctly identified by Spambayes?
> I get VASTLY more Spam than good mail, and in the last month of using
> Spambayes I've ended up training on over 200 spams, and only 33 hams.

[Graham]
> I'm in a similar position, and would be really interested in the
> opinions of the developers. I tend to train on my (already correctly
> classified) ham, just to try and keep the numbers even.

I personally try to keep the numbers even, by training on
correctly-classified ham.  The fact that it's already correctly classified
doesn't mean that training on it is no use - it's still worth doing.

There's been a lot written on the wiki about training strategies - start
at http://www.entrian.com/sbwiki/TrainingIdeas

-- 
Richie Hindle
richie at entrian.com



More information about the Spambayes mailing list