[Spambayes] Header annotations included in classification?

Mon Nov 24 09:12:44 EST 2003

Hello,

I'm using Spambayes-1.0a7 on a Windows machine with Outlook Express
(unfortunately) and am receiving mail through the POP3 proxy. Since it's
Outlook Express, under Header Options in the configuration web page I've turned
on the classification in the "subject:" header for spam (so all spam's now have
a subject line that starts with "spam,"). This seems to work great, but when I
look at clues for a given message, it looks like the header notation is being
included in the classification process:

...
her 0.95871559633
content-type:text/html 0.964536333901
free! 0.96511627907
click 0.966500345667
subject:, 0.979766182776
online 0.98951048951
subject:spam 0.997760079642

(both the "spam" and the "," tokens)

When I do a Word Query on these tokens I get the following results:

Statistics for 'subject:,'
   Number of spam messages: 79.
Number of ham messages: 1.
Probability that a message containing this word is spam: 0.979766182776.

Statistics for 'subject:spam'
   Number of spam messages: 100.
Number of ham messages: 0.
Probability that a message containing this word is spam: 0.997760079642.

Currently I have trained against 141 spam and 100 ham, and none of the original
messages had subject headers containing the word "spam". I've been training
Spambayes using the web interface (Home -> Review).

Is there a conflicting configuration settings or something I need to adjust?
FWIW the classification process seems to be working really well anyway, but I'm
not sure how long that will continue if Spambayes' own notations are
interpreted as a slam-dunk of a spam indicator. :)

Thanks for any help,
Dave