[Spambayes] Re: CRM114 in November breaks 99.9%. :-)
Ken Anderson
kanderson@bbn.com
Mon Dec 2 16:04:30 2002
The "train only on errors" bothers me. Can you say what you use for a training set and what you use for a test set?
At 09:44 AM 12/2/2002, Bill Yerazunis wrote:
>Final test statistics for CRM114 for November are in:
>
>Standard rules apply (no whitelists, no blacklists, realtime email stream
>only (no "canned spam"), train only on errors, polynomial length 5)
>
> For All of November (starting 9 AM Nov 1, ending 9 AM Dec 1)
>
> Spams Nonspams False False Total N+1 Accuracy NHC's
> Accepts Rejects Emails
> 1993 3914 4 0 5911 99.915 2
>
> Spam features in hash tables: 398K
> Nonspam features in hash tables: 299K
>
>There was just 1 spam that got through in the last week of November-
>a very strange spam written in mixed English and Czech trying to sell
>me diesel engine parts. It came through on a moto-head email list,
>which I suppose might be slightly topical, and it certainly was amusing,
>rather reminiscent of the Monty Python "camshaft smuggling" skit,
>but it's still spam and counts as such.
>
>This gives an N+1 accuracy of > 99.9% for the entire month of November.
>(99.932% for N-accuracy).
>
>So, CRM114 barely squeaked through the month at >99.9%. Barely. There's
>clearly still work to be done (the spambayes mailing list is kicking
>around the proper way to evaluate probabilities; I'm looking into some
>of their ideas as well.)
>
>
>
>--- On The Other Hand (the bad news)---
>
>December is looking much worse - TWO have gotten through already over
>the weekend (one "barnyard teen" pornspam- it hasn't seen that before)
>and one very short mortgage solicitation, written folksy-style.
>
>I'm also getting mailer errors now out of Sendmail whenever I do
>a "learn"; I'm starting to think that our systems people have
>upgraded something and broken something else in the process. This
>throws some question onto whether the CRM114 training code is actually
>getting run at all, or whether the increasing spam rate is
>symptomatic of the evolution of spam against static filters.
>
> -Bill Yerazunis
More information about the Spambayes
mailing list