[Spambayes] Re: [spambayes-bugs] Spambayes repeatedly classifies
messages frommailing list as SPAM despite multiple (20+)
recoveries fromspam folder
Brian Schwarz
brian at brightrock.com
Thu Sep 4 12:13:27 EDT 2003
Meyer, Tony wrote:
> Do you have really unbalanced numbers of ham & spam? For example,
> "cannot" is in 171 ham messages, but only 1 spam message - it really
> shouldn't get a score of 0.64.
>
> Spambayes works best trained with roughly equal numbers of ham & spam;
> we're still trying to come up with a good method of working with
> unbalanced training data. At the moment there is an option (defaults
> to 'on' in the Outlook plug-in) that adjusts the scores for unbalanced
> mail. It looks like this is what is happening here - because of the
> imbalance, a perfectly hammy word like "cannot" is getting a 0.64
> score.
OK, that makes sense. I have ~1000 ham and only ~100 spam messages. When I
was doing the training, I assumed that more data was preferable, and I had a
lot more stored examples of the good stuff. I'll try your suggestions.
Even with that hiccup, the program has done a pretty good job out of the
box.
Thanks,
Brian
More information about the Spambayes
mailing list