[Spambayes] Better optimization loop

Thu Nov 21 00:13:44 2002

So then, "T. Alexander Popiel" <popiel@wolfskeep.com> is all like:

> Argh.  I was working on it, too... hence the patch I just sent out.
> Oh, well... no big deal.  It looks like our implementations are
> significantly different, though.  Might be worth looking at both
> and seeing which is better.

I think what you did is a little closer to what Rob suggested to me in
response.  It sounds like a pretty good idea to me.  What I've been
doing in my idle time for the past few hours is playing around with
having the WordInfo class compute its own probability.  I did this by
defining two new methods:

    def probability(self):
        if not self.spamprob:
            self.update_probability()
        return self.spamprob

    def update_probability(self, nham, nspam):
        [basically the same code as Bayes.update_probabilites]

My idea was that you'd have to score the probability for each word
whenever you use it first, but after that the probability is cached.
Long-running things like the pop proxy will get the benefit of the
cached probabilities, and short-lived things like hammiefilter get much
faster training, and only slightly slower scoring.  At least, that's
what I expect.  I haven't tested this yet.