[Spambayes] There Can Be Only One
Greg Ward
gward@python.net
Wed, 25 Sep 2002 22:27:40 -0400
On 25 September 2002, Tim Peters said:
> Does anyone else intend to participate in this death match?
Yes, I've been running tests all afternoon and evening. Vague,
hand-wavey results:
* my histograms are not terribly normal -- not as weird as Guido's,
but not nearly as nice as Tim's
* I think my peaks are better separated though -- there's a pretty
wide range for spam_cutoff
* I'm one of the few who seems to win by setting set spam_cutoff < 0.5
Oh, my corpus is the python.org Sep 2002 harvest + all spam sent to
gward@python.net from Feb 2002 to Aug 2002 and caught by SpamAssassin +
everything sitting in my personal inboxes at around noon today. The
stuff from my inboxes was cleaned of "Received" headers that are clear
artifacts of the various ISPs I have used over the 2 years that stuff
has been piling up in those inboxes.
Here's the bottom line for Graham vs. Robinson f(w):
total unique fp went from 6 to 4 won -33.33%
mean fp % went from 0.3 to 0.2 won -33.33%
total unique fn went from 25 to 31 lost +24.00%
mean fn % went from 1.25 to 1.55 lost +24.00%
I'll post more complete results and a description of my corpus in the
morning.
Greg
--
Greg Ward <gward@python.net> http://www.gerg.ca/
If you can read this, thank a programmer.