[spambayes-dev] Stats are very slow
skip at pobox.com
skip at pobox.com
Wed Dec 20 10:51:27 CET 2006
Mark> Specifically, this line in addin.py is the culprit:
Mark> self.stats = bayes_stats.Stats(bayes_options,
Mark> self.classifier_data.message_db)
Mark> I've not even looked inside that module yet, but that seems quite
Mark> extreme, to the point I'm not sure the feature is worth that
Mark> cost... I guess the code is reading each record of my message DB
Mark> (which is 85MB) - but does anyone have any insights?
Yes, it appears to be doing just that. At the end of __init__ it calls
self.CalculatePersistentStats() which loops over all the keys in the
message_db. The author anticipated this in the docstring:
Calculate the statistics totals (i.e. not this session).
This is done by running through the messageinfo database and
adding up the various information. This could get quite time
consuming if the messageinfo database gets very large, so
some consideration should perhaps be made about what to do
then.
It might be worth deferring that call until it's really needed (say, in
GetStats()).
Skip
More information about the spambayes-dev
mailing list