[Spambayes] SpamBayes future platforms
Skip Montanaro
skip at pobox.com
Thu Aug 7 13:33:24 EDT 2003
Bob> Another thing: Bayesian filtering is inherently slooooow.
I run SpamBayes on a modest server (550MHz PIII running Linux which also
serves as a web and database server), filtering email for several email
addresses with no trouble. Unless the web server is heavily loaded, the
load average is generally around 0.5.
On my Mac it takes around 0.01 seconds to score a small message which is
already in memory. Here's a real quick demo (h is an object returned by
hammie.open()):
>>> glob.glob("*.msg")
['27314.msg', 'badspam.msg', 'diploma.msg', 'foo.msg', 'gibberish.msg', 'sqr.msg']
>>> for f in glob.glob("*.msg"):
... msg = file(f).read()
... t = time.time()
... s = h.score(msg)
... t = time.time()-t
... print "t: %.3f, size: %d, bytes/sec: %.1f" % (t, len(msg), len(msg)/t)
...
t: 0.008, size: 1870, bytes/sec: 246572.5
t: 0.161, size: 214051, bytes/sec: 1327917.4
t: 0.011, size: 2366, bytes/sec: 222682.3
t: 0.011, size: 3741, bytes/sec: 351896.6
t: 0.018, size: 4933, bytes/sec: 278182.3
t: 0.094, size: 8714, bytes/sec: 92651.9
I spend a lot more energy than that because I use hammiefilter in a procmail
setup and wind up firing up the whole system for each message. If SpamBayes
were running inside Exchange (with the filter engine all primed and ready to
go), I think you'd be okay.
Still, why waste all those cycles sitting on peoples' desks?
Skip
More information about the Spambayes
mailing list