[spambayes-dev] Newbie
Skip Montanaro
skip at pobox.com
Thu Jan 15 12:52:33 EST 2004
Greg> The following kinds of subjects often get past SB...
Greg> why get Vi`agra when you can get super Vi-agra
Greg> p.r.i.c.e.s are v.a.l.i.d until 16th of J.a.n.u.a.r.y
Greg> Can SB do its magic based on a modified text, i.e. all non-alpha
Greg> removed?
Greg> whygetViagrawhenyoucangetsuperViagra
Greg> pricesarevaliduntil16thofJanuary
Greg> Or is this already happening?
Yes, it could. No, it's not. ;-) I implemented an experimental
"remove-punctuation" config variable which did the obvious thing. I've
since deleted it from my copy of the code base, but it wouldn't be hard to
reimplement if desired. If your training database is fairly mature (trained
on enough samples of ham and spam) it turns out to not really help much
because SpamBayes actually does a very good job based on other clues it
finds in the message. Note that you don't really want to remove the
whitespace because then you get a single long token for each subject instead
of a series of separate words.
Skip
More information about the spambayes-dev
mailing list