[spambayes-dev] Stats are very slow

skip at pobox.com skip at pobox.com
Wed Dec 20 10:51:27 CET 2006


    Mark> Specifically, this line in addin.py is the culprit:

    Mark>         self.stats = bayes_stats.Stats(bayes_options,
    Mark>                                        self.classifier_data.message_db)

    Mark> I've not even looked inside that module yet, but that seems quite
    Mark> extreme, to the point I'm not sure the feature is worth that
    Mark> cost...  I guess the code is reading each record of my message DB
    Mark> (which is 85MB) - but does anyone have any insights?

Yes, it appears to be doing just that.  At the end of __init__ it calls
self.CalculatePersistentStats() which loops over all the keys in the
message_db.  The author anticipated this in the docstring:

    Calculate the statistics totals (i.e. not this session).

    This is done by running through the messageinfo database and
    adding up the various information.  This could get quite time
    consuming if the messageinfo database gets very large, so
    some consideration should perhaps be made about what to do
    then.

It might be worth deferring that call until it's really needed (say, in
GetStats()).

Skip


More information about the spambayes-dev mailing list