[Spambayes] Spambayes for python.org
Tim Peters
tim.one at comcast.net
Thu May 29 22:58:42 EDT 2003
[Greg Ward]
> Shortly before leaving my job at the MEMS Exchange, I replaced
> SpamAssassin on the mail server with Spambayes [1], and I took a
> chance on using a single corpus for the whole organization -- a dozen
> people and 3 or 4 mailing lists. It was pretty painful for the first
> few days, with >50% FP rate. (My initial corpus was a bunch of my
> spam, some stuff sent by clueless users to our webmaster address, and
> a healthy chunk of our one big, high-profile mailing list.) After
> some frantic retraining on real mail, things settled down and after a
> week or so, spambayes was pretty good -- noticeably better than
> SpamAssassin, but still hardly perfect.
As I said, you have to leave personal email out of it. The tests we ran
before excluded personal email carried by python.org, and did great.
> As I mentioned, I've also set SB up on my python.net address, where
> the corpus is my mail and only my mail. (And a large chunk of the
> corpus [maybe all of it, can't remember] was captured by the
> python.net SMTP server, so is very very close to what spambayes is
> being asked to evaluate day-by-day.) In this scenario, especially
> after a retraining session one month in, spambayes operates with
> terrifying laser-like precision. It's so good it's spooky.
OK, you can leave *your* personal email in the mix, but for the love of God
don't put Barry's in there too <wink>.
> On python.org, I'd like to see something closer to "terrifying
> laser-like precision" than "pretty good", so I'm willing to go to the
> trouble of building and maintaining multiple training corpi. But not
> 134 of them! (Especially not at 1-10 MB each.)
We ran tests before on python.org mailing-list traffic lumped into one big
ball. The software is even better now.
> Well, I'm still in playing-around mode. Will write back when I have
> interesting numbers or more meaningful questions. Thanks!
You're welcome. Now believe the data and stop trying to out-think it.
More information about the Spambayes
mailing list