[Spambayes] Using mxBeeBase as hammie DB
Tim Peters
tim.one@comcast.net
Thu Oct 17 22:03:30 2002
[Tim]
>> Pruning the database, and especially over time, is something that
>> needs work here.
[M.-A. Lemburg]
> Is there some way to do this automagically ?
No; that's part of what "needs work here" means. In addition, some fields
in the WordInfo records probably aren't needed, or at best are too big (like
saving an 8-byte double for a timestamp). It's also unknown how pruning
will affect accuracy over time, esp. since training is done on a
batch of words per msg
basis, but unless the tokenstream for each msg is saved, expiring words from
the database will yield a state that doesn't match any real-life combination
of training msgs.
Feel free to solve all that in your spare time <wink>.