[Spambayes] Upgrade problem

Tim Peters tim.one@comcast.net
Thu Nov 7 20:19:30 2002


[T. Alexander Popiel]
> Why don't we just store the counts, and only compute the probabilities
> when we need to reference them?  Yes, it is more efficient for bulk
> testing to only compute the probabilities once, but it's definitely
> a lose for incremental training.

Unqualified judgments are always wrong <wink>.  I often get email in batches
of 200, and scoring speed is important to me -- much more so than training
speed.  It will be even more so at python.org, where training probably won't
occur more often than once a week, but scoring is ongoing around the clock.
Note that for purposes of scoring, the *counts* needn't be saved at all now,
and a scoring-only database can exploit that (and this project's
neiltrain.py already does).

> Unless there's good arguments against, I'll make a patch for this
> in the next day or two.

When one size doesn't fit all, think instead about subclasses, different
methods, additional arguments, and/or instance attributes.  It's also nice
that the current code separates probability estimation algorithms from
probability combination algorithms.




More information about the Spambayes mailing list