[Spambayes] Ideas for an MSc project please...
Christopher Jastram
cej at intech.com
Mon Feb 9 01:00:17 EST 2004
dont bother wrote:
>>4) Improving Bayesian spam filtering at the SMTP
>>gateway level. Why is
>>it less effective, what can be done to improve it,
>>
>>
>
>Hey can you elaborate on that? I am a newbie so if you
>could explain me step by step on this, it would be
>great
>Thanks
>dont
>
>
Sure.
Providing a point-and-click installer that makes "Delete as Spam" and
"Recover from Spam" buttons magically appear on the Outlook toolbar is cool.
Asking users to forward spam to "spam at company.com" and an equal amount
of ham to "ham at company.com" is a PITA for all involved. (Never mind
trying to explain what "ham" is...)
Also, server-side filtering is a total f**k to set up (pardon the
profanity), especially in a user-specific manner (since Bayesian
filtering really doesn't work using the same database for multiple
users). It also takes up a snotload of resources, which is Not A Good
Thing(tm) on a busy mail server. For example, before the MyDoom virus,
we were processing 10 to 11 thousand emails every day. When MyDoom hit,
we started processing 350 thousand emails. Filled up the SYN_RECV
queue, and took the machine (and our network) to its knees. The first
thing I did was strip the bayesian filtering out, and promptly watched
the mail thoroughput quadruple. Server-side bayesian filtering (or any
content filtering, for that matter) is *expensive*. We are currently
purchasing two 64-bit AMD 3GHz machines with mirrored hard drives to
handle this kind of load, because we CAN NOT let valuable mail bounce.
(We were running a 667 MHz Celeron w/ 128 mb ram.)
Hope this hard-edged voice of experience helps a little. :)
Christopher Jastram
More information about the Spambayes
mailing list