[Spambayes] Ping: subject header ignored? [was: Not mining mySubject headers?]
Seth Goodman
sethg at goodmanassociates.com
Wed Feb 7 01:16:33 CET 2007
David Abrahams wrote on Tuesday, February 06, 2007 2:10 PM -0600:
> I understand from the above that subject words are considered, but it
> still seems to me that something must be wrong.
<...>
> Word # Spam # Ham Probability
> spam. 14 12 0.506679
You can see why this clue does not affect message classification: 14
spam were trained with this token and 12 ham, and the spam probability
of the token is 0.51.
> subject:spam 13 0 0.983271
OTOH, the token spam in a message subject is a strong spam clue, as
you've trained 13 such messages as spam and none as ham.
> which tells me that the tokenizer may be throwing out the brackets.
> OK, I see that it's doing so on both ends (when training and when
> classifying) so it's okay.
The tokenizer does throw out the brackets, but it still shows the word
inside the brackets as a token. I am guessing that it does not use the
token when you've told Spambayes to notate your subject line with that
word. Any chance that's the case?
Assuming it's not that simple, send the set of spam clues for a message
with [spam] in the subject.
--
Seth Goodman
More information about the SpamBayes
mailing list