[Spambayes] Ping: subject header ignored? [was: Not mining mySubject headers?]

Seth Goodman sethg at goodmanassociates.com
Wed Feb 7 01:16:33 CET 2007


David Abrahams wrote on Tuesday, February 06, 2007 2:10 PM -0600:

> I understand from the above that subject words are considered, but it
> still seems to me that something must be wrong.

<...>

> Word           # Spam         # Ham          Probability

> spam.          14             12             0.506679

You can see why this clue does not affect message classification:  14
spam were trained with this token and 12 ham, and the spam probability
of the token is 0.51.

> subject:spam   13             0              0.983271

OTOH, the token spam in a message subject is a strong spam clue, as
you've trained 13 such messages as spam and none as ham.

> which tells me that the tokenizer may be throwing out the brackets.
> OK, I see that it's doing so on both ends (when training and when
> classifying) so it's okay.

The tokenizer does throw out the brackets, but it still shows the word
inside the brackets as a token.  I am guessing that it does not use the
token when you've told Spambayes to notate your subject line with that
word.  Any chance that's the case?

Assuming it's not that simple, send the set of spam clues for a message
with [spam] in the subject.

--
Seth Goodman



More information about the SpamBayes mailing list