[Spambayes] More "spam of the future" lately?

Tim Peters tim.one at comcast.net
Wed Dec 17 15:42:13 EST 2003


>> 0.7 maybe, but you'd eventually regret dropping [spam_cutoff] to 0.5.

[Michael N. Nitabach]
> What makes you say that? I have my certain-spam cutoff at .30, and
> my uncertain at .01. My training database has about 8000 hams and
> 3000 spams. I have only ever received ten hams that scored over
> .01, and only one over .20.

Unless you've eyeballed every message scored as spam, then it's almost
certain you've suffered false positives due to those settings.  There's more
info on the project's background page:

    http://spambayes.sourceforge.net/background.html

Note especially the third graph.  The way spamprobs are combined in
SpamBayes guarantees that a highly ambiguous message will score very near
0.5 (explained in more detail before the third graph, and much more at

    http://www.linuxjournal.com/article.php?sid=6467

).

The kinds of email people get vary widely, though, and it's possible your
mix is extremely well-suited to this classifier, devoid of any significant
ambiguity.  (I'll note that if you use your SpamBayes'd email only for
professional purposes, and no personal ones (like chatting with friends and
relatives), it doesn't strain my imagination that your ham could be *so*
uniform that ambiguity doesn't arise -- but then your email mix would be
atypical too.)




More information about the Spambayes mailing list