[Spambayes] Server-side setup for corporate usage
Christopher Jastram
cej at intech.com
Tue Dec 30 12:05:07 EST 2003
Hello,
I've set up a server-side SpamBayes filter system. This is probably
breakable, and could use some improvement. It's also not done yet.
When things are complete, I'll stick the outline up somewhere on the web.
Here's the platform & stats:
Pentium 700 MHz, 128 MB Ram, 1 IDE HD
SuSE Linux, Postfix, Cyrus
Roughly 10 to 20 thousand emails / day, mostly spam :(
Load average: 1.5 to 4.0 (never below 1.2)
Postfix queue limited to three days
main.cf:
mailbox_transport = cyrus
master.cf:
smtp inet n - n - 12 smtpd -o content_filter=spambayes:
smtp unix - - n - 12 smtp
cyrus unix - n n - 12 pipe
user=cyrus argv=/usr/lib/cyrus/bin/deliver -e -r ${sender} -m
${extension} ${user}
spambayes unix - n n - 12 pipe
user=nobody argv=/usr/bin/hammiefilter.sh $sender $recipient
The third line of the "cyrus" entry belongs at the end of the second line.
Note the process limits of 12. Default is 100, which brings the system
to a crawl (load average: 80+ without spambayes). YMMV, esp. with SMP.
To newbies: note the different "smtp inet" and "smtp unix" lines. That
one threw me for a couple days. The instructions in the FAQ (?) on the
SpamBayes website show the smtp inet line. Don't edit the smtp unix
line, because it won't work.
hammiefilter.sh is attached. It is an adaption of the hammiefilter
found in the server-side setup instructions on the SpamBayes website.
To populate the user-specific databases, I use a Perl script (also
attached).
The way it works:
1) Postfix receives an email
This next part I'm not quite sure about, but anyway...
2) Postfix uses the 'cyrus' transport,
3) which calls "deliver"
4) which uses the "smtp inet" transport
5) which calls smtpd -o content_filter (which filters mail text through
an external filter)
(I'm pretty sure about the rest)
6) which uses /usr/bin/hammiefilter.sh to call sb_filter.sh
7) which uses /var/spambayes/hammie-$username.db to add an
X-SpamBayes-Classification header to the email.
8) Something magic happens, and the mail arrives in my inbox.
For training:
1a) User receives spam
1b) User receives ham
2a) User forwards said spam to spam at domain.com (domain is the client's
mail domain -- i.e., if I set this thing up for python.org, said user
would forward to spam at python.org)
2b) User forwards said ham to ham at domain.com
3) Perl script runs every 10 minutes, checks the ham and spam accounts,
and trains each messages appropriately against
/var/spambayes/hammie-$username.db.
Make sense? Me neither. I'm sure it will, though.
I've had the standard excellent results (with the notable exception of
aforementioned identity theft scams, except they're well-done Ebay
scams, rather than paypal.)
As I side note: I set up Mozilla Thunderbird to label messages with
different colors based on the value of the X-SpamBayes-Classification
header. Thus, it's trivial to train on unsures (they show up orange),
false positives, etcetera.
Question: Will extra data resulting from forwarding (such as the
"---Original Message---" line placed by Thunderbird) poison the
database? If I train with equal spam and ham, it *shouldn't* -- am I
correct?
Chris
-------------- next part --------------
A non-text attachment was scrubbed...
Name: hammiefilter.sh
Type: text/x-sh
Size: 540 bytes
Desc: not available
Url : http://mail.python.org/pipermail/spambayes/attachments/20031230/1458ce36/hammiefilter.bin
-------------- next part --------------
A non-text attachment was scrubbed...
Name: poll_ham-spam_mboxes.pl
Type: text/x-perl
Size: 2368 bytes
Desc: not available
Url : http://mail.python.org/pipermail/spambayes/attachments/20031230/1458ce36/poll_ham-spam_mboxes.bin
More information about the Spambayes
mailing list