[Spambayes] There Can Be Only One

Greg Ward gward@python.net
Wed, 25 Sep 2002 22:27:40 -0400

On 25 September 2002, Tim Peters said:
> Does anyone else intend to participate in this death match?

Yes, I've been running tests all afternoon and evening.  Vague,
hand-wavey results:

  * my histograms are not terribly normal -- not as weird as Guido's,
    but not nearly as nice as Tim's
  * I think my peaks are better separated though -- there's a pretty
    wide range for spam_cutoff
  * I'm one of the few who seems to win by setting set spam_cutoff < 0.5

Oh, my corpus is the python.org Sep 2002 harvest + all spam sent to
gward@python.net from Feb 2002 to Aug 2002 and caught by SpamAssassin +
everything sitting in my personal inboxes at around noon today.  The
stuff from my inboxes was cleaned of "Received" headers that are clear
artifacts of the various ISPs I have used over the 2 years that stuff
has been piling up in those inboxes.

Here's the bottom line for Graham vs. Robinson f(w):

  total unique fp went from 6 to 4 won    -33.33%
  mean fp % went from 0.3 to 0.2 won    -33.33%
  total unique fn went from 25 to 31 lost   +24.00%
  mean fn % went from 1.25 to 1.55 lost   +24.00%

I'll post more complete results and a description of my corpus in the

Greg Ward <gward@python.net>                         http://www.gerg.ca/
If you can read this, thank a programmer.