[Spambayes] Randomized Spam Beating SpamBayes

skip at pobox.com skip at pobox.com
Sat Oct 21 16:01:27 CEST 2006


    >> Once you find it, just add the options I mentioned to the [Tokenizer]
    >> section and restart.

    Shawn> Is there any means of directly testing that the settings applied
    Shawn> are actually taking effect?

Well, as of yesterday I can tell you they won't take effect.  There is a bug
in the ocrad.exe file.  Tony Meyer fixed that.  I've updated the
ocrad-cygwin package here:

    http://sourceforge.net/project/showfiles.php?group_id=61702

If you download that and replace the ocrad.exe in

    C:\Program Files\SpamBayes\bin

that will be one problem solved.  However, there will be more issues to deal
with.  If you could do me a favor, perhaps I can tweak things and further
update things so that it will actually find ocrad.exe and use it.  Locate
the file ImageStripper.py.  I think you'll find it at

    C:\Program Files\SpamBayes\spambayes\ImageStripper.py

Let me know where you find it.  I'll tweak a couple settings there and shoot
you a new copy.

    Shawn> Yes. I'm concerned about the volume of spam I might receive if I
    Shawn> were to try starting with a clean database. I get over 4,000
    Shawn> messages a day, with well over half of that being spam that I
    Shawn> receive with the express purpose of analyzing spam to train my
    Shawn> server to more efectively filter it. Starting with a blank
    Shawn> database, even if it were significantly fine-tuned within the
    Shawn> first day would leave literally thousands of spam messages
    Shawn> untrained in a single week. 

I think you should be able to do something like this:

    1. empty your database
    2. check your mail
    3. file a dozen or so spams as spam and a dozen or so hams as ham
    4. tell SpamBayes to recheck your inbox
    5. repeat 3 & 4 a couple times

You should wind up with it properly scoring most of your inbox very quickly.

    Shawn> On a very timely related note, the following article was
    Shawn> publicized by Frisk Software today:
    Shawn>   http://www.secureworks.com/analysis/spamthru/

    Shawn> It discusses the use of virus-infected botnets for spamming.

Sure, that's the major source of spam these days.

Skip


More information about the SpamBayes mailing list