[Spambayes] SpamBayes for 500.000 users

Christopher Jastram cej at intech.com
Tue Dec 16 16:47:58 EST 2003


Hi,

All I can say is ... wow ...

But I can give you some first-hand knowledge from a much smaller user 
base.  I'm setting the same thing up for an office of 5 people, and 
here's the bare-bones fact; I need a separate database for each user.  
I've tried using one database for everyone, and it does work.  But it 
only catches about 30-40 percent of spam.  Not sure why this is the 
case, but it is (unbalanced training?).

I'm still fiddling with making it work right (lots of other things take 
priority), but that's what I've discovered.  I'm sure others can help 
you out much more.

Also, unless you have 500,00 really really really super wise 
high-falutin' happy joyous technophiles for users, you'll have a sorry 
time educating everyone.

Chris

Dreas van Donselaar wrote:

>Hi everyone :)
>
> 
>
>I am quite new here but have been following the current discussions with a
>lot of interest. I actually have the plan to build a comprehensive anti-spam
>solution (yes, yet another one) which will mainly work server-side. A
>combination of the Cloudmark system (generating an unique ID per email ..
>and matching the ID in the central database to test whether it has been
>identified as spam before or not), Bayesian server-side and Bayesian user
>side seems to be the ideal solution.
>
> 
>
>I am not a real technical person, and I will hire developers to build this,
>but I was wondering whether Bayesian filtering will actually be useful if
>there would be 500.000 using a central database server. Should the database
>only store data for like 24 hour or would it make sense to keep it growing?
>Would there actually be extra value by having so many (reporting) users?
>
> 
>
>I was wondering if you guys/girls could give me some things to think about
>and maybe I can get some input about what has already been thought about by
>others before :)
>
> 
>
>P.S. Yes I know I'll need huge server-capacity.
>
>
>Regards,
>
> 
>
>Dreas van Donselaar
>
>
>  
>
>------------------------------------------------------------------------
>
>_______________________________________________
>Spambayes at python.org
>http://mail.python.org/mailman/listinfo/spambayes
>Check the FAQ before asking: http://spambayes.sf.net/faq.html
>





More information about the Spambayes mailing list