[spambayes-dev] subject parsing

Seth Goodman nobody at spamcop.net
Mon Jan 26 15:45:38 EST 2004


Just in passing, I noticed that a spam with the following subject line:

Try a Free H-G-H sample ... $49.95 value!

generated the following subject tokens (from 'All Message Tokens' in spam
clues):

'subject: '
'subject: ... $'
'subject:!'
'subject:$49.95'
'subject:-'
'subject:.'
'subject:...'
'subject:Free'
'subject:Try'
'subject:sample'
'subject:value'


Three observations:

1) having a token for '-' but not for 'H-G-H' appears to be ignoring
important information

2) a tokens for a single space seems of dubious value, but if it worked
better in testing, fine

3) the token for ' ... $' seem to be an odd choices for parsing

--
Seth Goodman

off-list replies to sethg [at] GoodmanAssociates [dot] com




More information about the spambayes-dev mailing list