[spambayes-dev] sb_bnfilter performance

Tony Meyer tameyer at ihug.co.nz
Wed May 5 18:21:45 EDT 2004


> Using the C implementation of sb_bnfilter to filter an 
> *empty* email reduces the run time to 3ms, so it looks
> like any further gains will come from changes in sb_bnserver......

Out of curiosity, have you profiled sb_bnserver at all?  I wonder if the
actual tokenization/classification of the message is the majority of this
21ms time, which would be hard to improve in speed (without starting to
recode the core SpamBayes code itself in C).  If you have profiled, it'd be
interesting to see where the time is being spent (i.e. please post it!).

You might be able to find gains by turning off some tokenizing options
(perhaps there are time-expensive ones that don't give much in the way of
accuracy?).  Using Python 2.4 (from CVS) might also speed things up a bit,
since I gather that there are numerous speed improvements with the built-ins
like dict and list.

> These changes are in the bnfilter_in_c_branch CVS branch for 
> now; they dont belong in the 1.0 release.

Note that soon (today, probably) there'll be a 1.0 branch and so you'll be
able to put these on the trunk.

=Tony Meyer




More information about the spambayes-dev mailing list