[Spambayes] Habeas marked email

Tue May 11 10:51:30 EDT 2004

Tony Meyer wrote:
> 
> > Since so far the only Habeas marked email I've received has
> > been from Habeas itself, and I am unlikely to receive any
> > other legitimate Habeas marked mail, this may be a rather
> > pointless exercise.
> 
> Most of the Habeas marked mail I get is from the same sources (the TidBITS
> and TidBITS-Talk mailing lists, for example).  OTOH, apart from a short
> burst last year, I don't get any spam with the headers.

OK, maybe I'll signup for those for a while, just to get the legitimate
Habeas marked email. (which should leave me with a SpamBayes
installation that knows how to do one trick that it is never called upon
to do :)

> 
> > If you DO drop it from the 1.0 release,
> > is there any way I (or any other SB user) could add it back
> > in.  I thought I saw something about adding custom
> > tokens/filters/wudayumucallums but I can't find it now.
> 
> If you run from the source, then it would be simple to add it back in.  From
> the binary it'd be rather tricky.  If there was ever demand for it, it
> wouldn't be that difficult to implement a 'plug-in' type system where you
> could get SpamBayes to ask a program for additional tokens, given a message,
> but that seems a way off yet, if it is ever required.
> 
> > If I'm understanding the purpose/focus of Habeas, the only
> > legitimate Habeas marked email I'd receive would be from mail
> > lists I sign up for.
> 
> Or individuals, yes.

This one still puzzles me.  I've asked Habeas, but so far have only
received the automated reply from support, but the individual license
still requires the recipients have an opportunity to 'opt-out' / or
somehow indicate that they do agree to receive the email I'd send them. 
Sending personal email to a few friends and family doesn't seem to me to
be the situation that warrents this.  WITHOUT that requirement, I would
think Habeas marking personal email would be 'a good thing', especially
with so much of the spam and virus mailers co-opt innocent individuals'
address books, so the spam appears to come from a legitimate and trusted
source.
> 
> > I've given this a bit more thought since my first post.  I
> > don't think there will be anything in the email that will
> > point to Habeas headersa being valid or invalid.  I think it
> > is the server-based spam filters that determine this, by
> > checking the Habeas whitelist.
> 
> Ah, we don't do anything with the whitelist (there are two types of Habeas
> validation - the whitelist, and the headers).  What SpamBayes does is look
> at the headers and if the X-Habeas-SWE headers are there, it checks them
> against the correct values (the haiku etc) and generates either an "invalid"
> or a "valid" token.  SpamBayes has the correct values hardcoded in the
> tokenizer.

Hmm.  I'd assumed counterfeiters would counterfeit the entire mark, the
headers and the exact haiku content of the headers.  It never occurred
to me that anyone might counterfeit just the headers, but with different
content.  From the Habeas explanation about being able to copyright a
poem but not the individual words, I'd assumed the headers (header
names) were NOT protected, just the poem.  So counterfeiting the
headers, but using entirely original header content would not be
defended against.  So any spam checker would have to do what you are
doing, check the ENTIRE Habeas mark.

Would it be possible for SpamBayes to generate yet another addition to
the checked email, not a header this time, but plain text at the end of
the email, to the effect that "SpamBayes has determined this email
appears to be Habeas Sender Warranted Email"?  I'm thinking most email
recipients do NOT normally have all headers visible so would normally
wouldn't know ... Never mind.  This would be Habeas's responsibility,
not SpamBayes'.  Anyone using SpamBayes and looking for the Habeas marks
would probably be setting up their email client filters to catch Habeas
marked email that nevertheless appears to be spam.

> 
> Using the whitelist, like (IIRC) a recent SpamAssassin update does, is much
> more tricky, because it means that SpamBayes has to send out a request and
> get a response.  This is slow and requires an active Internet connection (so
> is more suited to server-side operation).  If there was demand, we could
> implement it and see if it made much of a difference, but no-one's bothered
> as yet.  (If we did, we could optionally connect to any number of
> white/black lists and generate tokens based on the results).
> 
> > O.K. This makes sense.  I should make a filter in my mail
> > client to look for SpamBayes classified 'Spam' email that
> > DOES contain Habeas marks.
> 
> Yes, that would work.
> 
> > So there would be strong possibility that anything caught in
> > that folder would be abused/invalid Habeas mail.
> 
> Yes, it should either be that or a false positive by SpamBayes (which should
> be very rare).
> 
> > I guess unless and until I get real email spam with
> > counterfeit Habeas headers, I could cut and paste them into
> > already received spam and retrain SB on those.
> 
> I don't think counterfeit Habeas headers are at all common, so this probably
> wouldn't be worth it.  Spam that adds the headers tends to add them
> correctly, I think.  This is also another "stupid beats smart" case, where
> just leaving SpamBayes to do it's thing should work best.
> 
> =Tony Meyer
> 
> ---
> Please always include the list (spambayes at python.org) in your replies
> (reply-all), and please don't send me personal mail about SpamBayes. This
> way, you get everyone's help, and avoid a lack of replies when I'm busy.