[Spambayes] Backup daqtabase

Jesse Pelton jsp at PKC.com
Tue Oct 11 19:45:55 CEST 2005


Yuck.  Don't wanna go there.

I'm pretty sure I've seen discussion on the list of a training regimen
that involved discarding aged messages (you could even have been
involved), but I guess that would be done by separately maintaining a
corpus of ham and spam that is periodically used to train from scratch.
Does that make sense?

> -----Original Message-----
> From: Tim Peters [mailto:tim.peters at gmail.com] 
> Sent: Tuesday, October 11, 2005 1:15 PM
> To: Jesse Pelton
> Cc: david at treworgans.co.uk; spambayes at python.org
> Subject: Re: [Spambayes] Backup daqtabase
> 
> [Jesse Pelton]
> >  ...
> > Developers: would it be feasible and sensible to add UI to 
> allow users to
> > remove messages older than a user-specified cutoff? If so, 
> I'll log a
> > feature request.
> 
> The database doesn't hold training messages, it only contains
> statistics computed from the union of tokens seen across all training
> messages.  To support removing old messages from the training data
> would require additional database work, a mapping from some sort of
> message identifier to a list of all tokens that were seen in that
> message, so that those _tokens_ could be removed from the statistics
> later.  Note that many options change the exact tokens extracted from
> a message, so it would not be enough just to save the original message
> (there's no guarantee the same collection of tokens could be extracted
> from it later).
> 
> That would be a fair amount of work, another pile of messy UI issues,
> and would need a larger database.
> 
> FWIW, I routinely throw away my database and start over from scratch
> too.  Watching it improve is fun :-)!
> 


More information about the SpamBayes mailing list