[Spambayes] Re: Move closer to Gary's ideal

Sat, 21 Sep 2002 12:11:49 -0400

On 21 September 2002, Skip Montanaro said:
> Which leads to another experiment.  You could scale its probabilities to
> match the default scale used by SpamAssassin (>= 5.0 is considered spam) and
> compare results from the two.

I'm dubious about that -- SA's score is theoretically open-ended, while
spambayes always computes something between 0 and 1 (which is why we
like to think of it as a probability.)  (In reality, SA tops out around
50 -- I think the highest scoring spam I've seen was in the low 40s;
ISTR someone on one of the SA lists bragging about seeing one that
cracked 50.)

I guess you could play games with graphing the prob. distributions of SA
scores for a given corpus, and then come up with some correspondence
between the two scoring functions.  X-Spam-Level is *really* handy, and
not just for people using simplistic filtering schemes at the MDA
level.  Eg. I was just using it to clean up the python.org corpus
gathered last week -- using mutt to limit my view to messages matching
  ~h x-spam-level:\ [*]{4}$
showed me the "4-star" messages only.  Repeat with {3}, {2}, {1} to see
progressively less spammy subsets of the folder.

        Greg
-- 
Greg Ward <gward@python.net>                         http://www.gerg.ca/
All things are possible -- except skiing through a revolving door.