[Spambayes] Client/Server filtering model? Anyone have code?
Christopher Jastram
cej at intech.com
Thu Nov 6 20:54:24 EST 2003
Currently, I have a script that runs through every user's spool
directory and;
Everything in "Spam" or "Junk" is read as spam
Things like the main inbox, sent, drafts and trash are discarded
Everything else is read as ham.
This script takes a significant amount of time to run (along the order
of 30-40 minutes), is run on a nightly basis, and only processes a few
users (just a couple test users -- once everyone switches from POP to
IMAP, there'll be hell to pay unless I can figure out a faster way to do
this.) The system load also jumped from 0.1 - 0.2 to a steady 1.6 - 1.8
with the addition of spam filtering (we get a lot of spam).
An idea I had -- would it be possible to have *one* (multithreaded?)
constantly running python server that reads mail, evaluates it (addes a
classified or trained header) and passes it back? The advantage is that
the database would not be re-read on every invokation. Another
advantage might be in the outsourcing of the processing to another
machine. Just an idea. I'll work on something like this myself when I
get a roundtoit (always short on time), but I'm wondering if anyone has
prototype code...? (Is this even a good idea?)
Thanks,
chris
P.S. on the 'lot of spam' note: most of the mail comes to 10 years worth
of employees who longer work here. We used to bounce it, but the
resulting mail queue and processing time took our mail server to its
knees. Repeatedly. Both Exchange and Postfix/Cyrus. Now there's a
dead-letter box and a 3-day hold period instead of the default 5, which
brings it down to manageable (sorta) levels. Anyone have any additional
ideas or guides to using Postfix to drop blatantly spammy email? To
give you a sense of what I'm looking for, here's an example; I cut
incoming spam by a good 10 percent, just by dropping all messages coming
from outside, but claiming to be from within our domain. Surely there's
more such rules that I haven't thought of?
More information about the Spambayes
mailing list