[Spambayes] loosen up address_headers option?

Skip Montanaro skip at pobox.com
Tue Jan 14 13:08:21 EST 2003


The tokenizer's address_headers option only examines "from".  The code has
this comment:

        # Dang -- I can't use Sender:.  If I do,
        #     'sender:email name:python-list-admin'
        # becomes the most powerful indicator in the whole database.
        #
        # From:         # this helps both rates
        # Reply-To:     # my error rates are too low now to tell about this
        #               # one (smalls wins & losses across runs, overall
        #               # not significant), so leaving it out
        # To:, Cc:      # These can help, if your ham and spam are sourced
        #               # from the same location. If not, they'll be horrible.

which dates from a time early in the spambayes development history.  (Can't
tell exactly when since the recent directory reorganization.  Could the loss
of cvs comments have been avoided?)  Much water has passed under the
tokenizing bridge since then.  I'm skeptical that the above token all by
itself would relegate any spam to the hambox.

In my personal experience, adding to and cc headers to the list would pick
up some strong spam clues.  While there are any number of <foo>@mojam.com
email aliases which eventually reach me, most are essentially unused, having
been harvested from obscure places in the Mojam websites and are rarely used
by real people with Mojam business to transact.

As spambayes moves out of the experimental stage, perhaps it's worth looking
at adding to and cc (and maybe reply-to and sender) to the default list of
analyzed headers.

Skip




More information about the Spambayes mailing list