[Spambayes] Setting the Spambayes timers

Tony Meyer tameyer at ihug.co.nz
Thu Aug 19 00:15:36 CEST 2004


> Today I checked one of the 'spammy' messages which Spambayes 
> gave a zero spam rating to. The body was a string of random 
> non-spammy sentences, plus a large graphic which contained 
> the spammer's sales pitch. The title was "Ch_eap 0_E_M s.oft 
> shi~pp~ing worl_dwide teakettle", which I guess didn't give 
> Spambayes enough clues.

Yes, these 'mini-spams' or 'micro-spams' will be the toughest for SpamBayes
to work with, because there isn't much information - for the most part, just
the headers.

We are looking into ways to better deal with these (although for many the
headers and whatever body there is does provide enough clues).  The
'use-bigrams' option might help somewhat (it considers pairs of words as
well as individual words), as might some of the other options that are off
by default.

(Turning on these options in Outlook is a somewhat difficult process.  You
have to open up the 'default_bayes_customize.ini' file in the SpamBayes data
directory (or create it if there isn't one) and add the appropriate options.
For example, you'd add

[Classifier]
x-use_bigrams:True

for the bigrams option.  The tricky bit is that there isn't any way for
Outlook users to see what options are available - some sort of documentation
about that might be a good idea when someone has time.  You can look at the
Options.py file in CVS from http://sf.net/projects/spambayes, but that's a
bit tricky, really).

At the moment, I'm looking at using query expansion (like, e.g., search
engines do with search terms) to try and help with classifying these short
spams.  Other people are trying some other things, too, I think.

=Tony Meyer

---
Please always include the list (spambayes at python.org) in your replies
(reply-all), and please don't send me personal mail about SpamBayes. This
way, you get everyone's help, and avoid a lack of replies when I'm busy.



More information about the Spambayes mailing list