[Spambayes] Habeas marked email

Mon May 10 17:49:06 EDT 2004

> Since so far the only Habeas marked email I've received has 
> been from Habeas itself, and I am unlikely to receive any 
> other legitimate Habeas marked mail, this may be a rather 
> pointless exercise.

Most of the Habeas marked mail I get is from the same sources (the TidBITS
and TidBITS-Talk mailing lists, for example).  OTOH, apart from a short
burst last year, I don't get any spam with the headers.

> If you DO drop it from the 1.0 release, 
> is there any way I (or any other SB user) could add it back 
> in.  I thought I saw something about adding custom 
> tokens/filters/wudayumucallums but I can't find it now.

If you run from the source, then it would be simple to add it back in.  From
the binary it'd be rather tricky.  If there was ever demand for it, it
wouldn't be that difficult to implement a 'plug-in' type system where you
could get SpamBayes to ask a program for additional tokens, given a message,
but that seems a way off yet, if it is ever required.

> If I'm understanding the purpose/focus of Habeas, the only 
> legitimate Habeas marked email I'd receive would be from mail 
> lists I sign up for. 

Or individuals, yes.

> I've given this a bit more thought since my first post.  I 
> don't think there will be anything in the email that will 
> point to Habeas headersa being valid or invalid.  I think it 
> is the server-based spam filters that determine this, by 
> checking the Habeas whitelist.

Ah, we don't do anything with the whitelist (there are two types of Habeas
validation - the whitelist, and the headers).  What SpamBayes does is look
at the headers and if the X-Habeas-SWE headers are there, it checks them
against the correct values (the haiku etc) and generates either an "invalid"
or a "valid" token.  SpamBayes has the correct values hardcoded in the
tokenizer.

Using the whitelist, like (IIRC) a recent SpamAssassin update does, is much
more tricky, because it means that SpamBayes has to send out a request and
get a response.  This is slow and requires an active Internet connection (so
is more suited to server-side operation).  If there was demand, we could
implement it and see if it made much of a difference, but no-one's bothered
as yet.  (If we did, we could optionally connect to any number of
white/black lists and generate tokens based on the results).

> O.K. This makes sense.  I should make a filter in my mail 
> client to look for SpamBayes classified 'Spam' email that 
> DOES contain Habeas marks. 

Yes, that would work.

> So there would be strong possibility that anything caught in 
> that folder would be abused/invalid Habeas mail.

Yes, it should either be that or a false positive by SpamBayes (which should
be very rare).

> I guess unless and until I get real email spam with 
> counterfeit Habeas headers, I could cut and paste them into 
> already received spam and retrain SB on those.

I don't think counterfeit Habeas headers are at all common, so this probably
wouldn't be worth it.  Spam that adds the headers tends to add them
correctly, I think.  This is also another "stupid beats smart" case, where
just leaving SpamBayes to do it's thing should work best.

=Tony Meyer

---
Please always include the list (spambayes at python.org) in your replies
(reply-all), and please don't send me personal mail about SpamBayes. This
way, you get everyone's help, and avoid a lack of replies when I'm busy.