[spambayes-dev] Newbie

Skip Montanaro skip at pobox.com
Thu Jan 15 12:52:33 EST 2004


    Greg> The following kinds of subjects often get past SB...

    Greg>       why get Vi`agra when you can get super Vi-agra

    Greg>       p.r.i.c.e.s are v.a.l.i.d until 16th of J.a.n.u.a.r.y


    Greg> Can SB do its magic based on a modified text, i.e. all non-alpha
    Greg> removed?

    Greg>       whygetViagrawhenyoucangetsuperViagra

    Greg>       pricesarevaliduntil16thofJanuary

    Greg> Or is this already happening?

Yes, it could.  No, it's not. ;-) I implemented an experimental
"remove-punctuation" config variable which did the obvious thing.  I've
since deleted it from my copy of the code base, but it wouldn't be hard to
reimplement if desired.  If your training database is fairly mature (trained
on enough samples of ham and spam) it turns out to not really help much
because SpamBayes actually does a very good job based on other clues it
finds in the message.  Note that you don't really want to remove the
whitespace because then you get a single long token for each subject instead
of a series of separate words.

Skip



More information about the spambayes-dev mailing list