[spambayes-dev] I took a big step Tuesday...

Thu Jul 24 10:22:38 EDT 2003

In message:  <16159.64832.752182.993382 at montanaro.dyndns.org>
             Skip Montanaro <skip at pobox.com> writes:
>
>After having used Spambayes since last September and scanning all messages
>marked as spam during that time, I made the decision a couple days ago to
>simply dump spam which scores 1.00 (or 1,00 if you've been following the
>recent locale saga).  I mention it here to suggest that maybe it's
>worthwhile to consider creating finer-grained "spam" categories.

I think that there is some use to finer-grain categories, but
I'm not convinced of both (a) the score should be used for such
categorization, and (b) it needs to be done in spambayes.

>The step I took Tuesday was to simply dump mail which scores 1.00.  That
>eliminates roughly 85% of the spam

I've adopted a slightly different rule: if both spambayes and
spamassassin agree that it's spam, then I toss it without looking
at it as the first step of my weekly spam management.  This is
_not_ an automatic thing on delivery, so if I'm expecting something
that's likely to be spammy, I can save it, no matter how it scored.
Note that I don't run spamassassin myself, but several of my
upstreams do, and I get benefit thereby.

>I don't recall the last time I saw a false positive, and the place where
>mistakes are most likely to be made are in the lower scoring spams.

I've had a couple false positives recently, both receipts from
online purchases.  They both scored very high by spambayes and
spamassassin.

>Relating that to spambayes-dev subject matter, perhaps a "super-spam" cutoff
>could be created which would automatically delete messages which score at or
>above that value if the user's training set was "large enough".  Thus, if
>they started training from scratch it would have no effect.  By default, it
>would be set to something > 1.0 to prevent it from coming into play
>unexpectedly.  I don't know what "large enough" is though.

I don't think that such extra gradation needs to be in the spambayes
code; the obvious super-spam at 1.00 is easily matched by MDAs already,
and more generic benefit is derived from using a completely different
method such as spamassassin to further categorize.

- Alex