[Spambayes] Re: Move closer to Gary's ideal
Greg Ward
gward@python.net
Sat, 21 Sep 2002 12:11:49 -0400
On 21 September 2002, Skip Montanaro said:
> Which leads to another experiment. You could scale its probabilities to
> match the default scale used by SpamAssassin (>= 5.0 is considered spam) and
> compare results from the two.
I'm dubious about that -- SA's score is theoretically open-ended, while
spambayes always computes something between 0 and 1 (which is why we
like to think of it as a probability.) (In reality, SA tops out around
50 -- I think the highest scoring spam I've seen was in the low 40s;
ISTR someone on one of the SA lists bragging about seeing one that
cracked 50.)
I guess you could play games with graphing the prob. distributions of SA
scores for a given corpus, and then come up with some correspondence
between the two scoring functions. X-Spam-Level is *really* handy, and
not just for people using simplistic filtering schemes at the MDA
level. Eg. I was just using it to clean up the python.org corpus
gathered last week -- using mutt to limit my view to messages matching
~h x-spam-level:\ [*]{4}$
showed me the "4-star" messages only. Repeat with {3}, {2}, {1} to see
progressively less spammy subsets of the folder.
Greg
--
Greg Ward <gward@python.net> http://www.gerg.ca/
All things are possible -- except skiing through a revolving door.