[Spambayes] More "spam of the future" lately?
Michael N. Nitabach
mnitabach at acedsl.com
Wed Dec 17 16:00:39 EST 2003
> -----Original Message-----
> From: Tim Peters [mailto:tim.one at comcast.net]
> Sent: Wednesday, December 17, 2003 3:42 PM
> To: Michael N. Nitabach; spambayes at python.org
> Subject: RE: [Spambayes] More "spam of the future" lately?
>
>
> >> 0.7 maybe, but you'd eventually regret dropping
> [spam_cutoff] to 0.5.
>
> [Michael N. Nitabach]
> > What makes you say that? I have my certain-spam cutoff at .30, and
> > my uncertain at .01. My training database has about 8000 hams and
> > 3000 spams. I have only ever received ten hams that scored over
> > .01, and only one over .20.
>
> Unless you've eyeballed every message scored as spam, then it's almost
> certain you've suffered false positives due to those
> settings.
I just looked in my certain-spam folder at all e-mails that scored below 0.70. Only a single one was a false positive: a SpamBayes mailing list digest that contained a complete actual spam e-mail that someone had posted, which scored 0.49.
> There's more
> info on the project's background page:
>
> http://spambayes.sourceforge.net/background.html
>
> Note especially the third graph. The way spamprobs are combined in
> SpamBayes guarantees that a highly ambiguous message will
> score very near
> 0.5 (explained in more detail before the third graph, and much more at
>
> http://www.linuxjournal.com/article.php?sid=6467
>
> ).
I receive a substantial amount of e-mail that scores between 0.30 and 0.70, but so far it has *all* been spam.
> The kinds of email people get vary widely, though, and it's
> possible your
> mix is extremely well-suited to this classifier, devoid of
> any significant
> ambiguity.
Well, the interesting thing is that a lot of my spam is relatively technical sales-pitch e-mail that is talking about the same sorts of things that I talk about in my ham professional e-mails.
> (I'll note that if you use your SpamBayes'd email only for
> professional purposes, and no personal ones (like chatting
> with friends and
> relatives), it doesn't strain my imagination that your ham
> could be *so*
> uniform that ambiguity doesn't arise -- but then your email
> mix would be
> atypical too.)
No, I use it for equal parts professional and personal correspondence.
Michael N. Nitabach, Ph.D., J.D.
Assistant Professor
Department of Cellular and Molecular Physiology
Yale University School of Medicine
(203) 737-2939
mnitabach at acedsl.com
More information about the Spambayes
mailing list