[Spambayes] Spambayes for python.org

Tim Peters tim.one at comcast.net
Thu May 29 22:58:42 EDT 2003


[Greg Ward]
> Shortly before leaving my job at the MEMS Exchange, I replaced
> SpamAssassin on the mail server with Spambayes [1], and I took a
> chance on using a single corpus for the whole organization -- a dozen
> people and 3 or 4 mailing lists.  It was pretty painful for the first
> few days, with >50% FP rate.  (My initial corpus was a bunch of my
> spam, some stuff sent by clueless users to our webmaster address, and
> a healthy chunk of our one big, high-profile mailing list.)  After
> some frantic retraining on real mail, things settled down and after a
> week or so, spambayes was pretty good -- noticeably better than
> SpamAssassin, but still hardly perfect.

As I said, you have to leave personal email out of it.  The tests we ran
before excluded personal email carried by python.org, and did great.

> As I mentioned, I've also set SB up on my python.net address, where
> the corpus is my mail and only my mail.  (And a large chunk of the
> corpus [maybe all of it, can't remember] was captured by the
> python.net SMTP server, so is very very close to what spambayes is
> being asked to evaluate day-by-day.)  In this scenario, especially
> after a retraining session one month in, spambayes operates with
> terrifying laser-like precision.  It's so good it's spooky.

OK, you can leave *your* personal email in the mix, but for the love of God
don't put Barry's in there too <wink>.

> On python.org, I'd like to see something closer to "terrifying
> laser-like precision" than "pretty good", so I'm willing to go to the
> trouble of building and maintaining multiple training corpi.  But not
> 134 of them!  (Especially not at 1-10 MB each.)

We ran tests before on python.org mailing-list traffic lumped into one big
ball.  The software is even better now.

> Well, I'm still in playing-around mode.  Will write back when I have
> interesting numbers or more meaningful questions.  Thanks!

You're welcome.  Now believe the data and stop trying to out-think it.




More information about the Spambayes mailing list