[Spambayes] SpamBayes future platforms

Skip Montanaro skip at pobox.com
Thu Aug 7 13:33:24 EDT 2003


    Bob> Another thing: Bayesian filtering is inherently slooooow. 

I run SpamBayes on a modest server (550MHz PIII running Linux which also
serves as a web and database server), filtering email for several email
addresses with no trouble.  Unless the web server is heavily loaded, the
load average is generally around 0.5.

On my Mac it takes around 0.01 seconds to score a small message which is
already in memory.  Here's a real quick demo (h is an object returned by
hammie.open()):

    >>> glob.glob("*.msg")
    ['27314.msg', 'badspam.msg', 'diploma.msg', 'foo.msg', 'gibberish.msg', 'sqr.msg']
    >>> for f in glob.glob("*.msg"):
    ...   msg = file(f).read()
    ...   t = time.time()
    ...   s = h.score(msg)
    ...   t = time.time()-t
    ...   print "t: %.3f, size: %d, bytes/sec: %.1f" % (t, len(msg), len(msg)/t)
    ... 
    t: 0.008, size: 1870, bytes/sec: 246572.5
    t: 0.161, size: 214051, bytes/sec: 1327917.4
    t: 0.011, size: 2366, bytes/sec: 222682.3
    t: 0.011, size: 3741, bytes/sec: 351896.6
    t: 0.018, size: 4933, bytes/sec: 278182.3
    t: 0.094, size: 8714, bytes/sec: 92651.9

I spend a lot more energy than that because I use hammiefilter in a procmail
setup and wind up firing up the whole system for each message.  If SpamBayes
were running inside Exchange (with the filter engine all primed and ready to
go), I think you'd be okay.

Still, why waste all those cycles sitting on peoples' desks?

Skip



More information about the Spambayes mailing list