[Spambayes] Randomized Spam Beating SpamBayes
skip at pobox.com
skip at pobox.com
Sat Oct 21 16:01:27 CEST 2006
>> Once you find it, just add the options I mentioned to the [Tokenizer]
>> section and restart.
Shawn> Is there any means of directly testing that the settings applied
Shawn> are actually taking effect?
Well, as of yesterday I can tell you they won't take effect. There is a bug
in the ocrad.exe file. Tony Meyer fixed that. I've updated the
ocrad-cygwin package here:
http://sourceforge.net/project/showfiles.php?group_id=61702
If you download that and replace the ocrad.exe in
C:\Program Files\SpamBayes\bin
that will be one problem solved. However, there will be more issues to deal
with. If you could do me a favor, perhaps I can tweak things and further
update things so that it will actually find ocrad.exe and use it. Locate
the file ImageStripper.py. I think you'll find it at
C:\Program Files\SpamBayes\spambayes\ImageStripper.py
Let me know where you find it. I'll tweak a couple settings there and shoot
you a new copy.
Shawn> Yes. I'm concerned about the volume of spam I might receive if I
Shawn> were to try starting with a clean database. I get over 4,000
Shawn> messages a day, with well over half of that being spam that I
Shawn> receive with the express purpose of analyzing spam to train my
Shawn> server to more efectively filter it. Starting with a blank
Shawn> database, even if it were significantly fine-tuned within the
Shawn> first day would leave literally thousands of spam messages
Shawn> untrained in a single week.
I think you should be able to do something like this:
1. empty your database
2. check your mail
3. file a dozen or so spams as spam and a dozen or so hams as ham
4. tell SpamBayes to recheck your inbox
5. repeat 3 & 4 a couple times
You should wind up with it properly scoring most of your inbox very quickly.
Shawn> On a very timely related note, the following article was
Shawn> publicized by Frisk Software today:
Shawn> http://www.secureworks.com/analysis/spamthru/
Shawn> It discusses the use of virus-infected botnets for spamming.
Sure, that's the major source of spam these days.
Skip
More information about the SpamBayes
mailing list