[Spambayes] filtering based on headers only?

Meyer, Tony T.A.Meyer at massey.ac.nz
Wed Aug 13 20:55:03 EDT 2003


> I'm trying to setup SpamBayes to do training and spam 
> filtering based on headers only.

For reference, you might want to read the threads starting here:
<http://mail.python.org/pipermail/spambayes/2003-May/005378.html>
<http://mail.python.org/pipermail/spambayes/2002-November/001525.html>
<http://mail.python.org/pipermail/spambayes/2002-October/001215.html>

> While examining the source to do necessary patches, I found 
> out that the block responsible for stats updating and 
> messages cacheing is under if command == 'RETR': - thus 
> effectively preventing training by headers only.

This is deliberate - otherwise you'd end up with a cache that had both
headers only (TOP) and full (RETR) versions of messages, which could
lead into all sorts of confusion.  Since you know what you want to do,
you should be fine with removing that constraint.

> Now, for the second question.
> I get the following in the log (the message is spam BTW):
[...]
> X-Spambayes-Exception: exceptions.UnicodeDecodeError('ascii' 
[...]
> and it seems to me that exception instead of spam 
> classification is not what I really want. ;) However, that 
> message (headers only) gets to the cache OK, without any exceptions.

The idea here is that examining this message failed - something in the
Python email package raised an exception.  Instead of letting this
exception stop the execution of the proxy, we keep going, but let the
user know that something went wrong.  If you wanted to, you could add a
rule to your MUA that put all mail with the Spambayes-Exception header
into a separate folder - anedoctal reports suggest that these are
reasonably likely to be spam, so it could be put in the unsure folder.

Whenever a fix can be idenfied for the exception, this is applied, so
that the number of times this happens should reduce over time, but there
are always new ways to break the mail parsing...in particular, a new,
more tolerant, version of the parser is being worked on, so that might
result in large improvements.

=Tony Meyer



More information about the Spambayes mailing list