[Spambayes] ageing out database entries

Ryan Malayter rmalayter at bai.org
Tue Nov 11 16:54:18 EST 2003


> From: Seth Goodman
> Subject: RE: [Spambayes] ageing out database entries

> [Seth Goodman]
> > > Is there an optimal total corpus size for training?
> 
> [Ryan Malayter]
> > Not really, but evidence seems to suggest that a thousand 
> or messages in
> > seems to work well. However, 10,000 or more messages seems 
> to decrease
> > the capture rates somewhat.

I don't remember exactly where I read that rule-of-thumb, but I recall
it came from this list a wile back.

I did some searching through the list archives, and came up with a dew
messages that mentioned the 10,000 number as a threshold of sorts.
Perhaps there are just diminishing (or no) increases in accuracy above
this point, and accuracy doesn't actually get worse.

Here's the list:

http://mail.python.org/pipermail/spambayes/2002-November/001705.html
http://mail.python.org/pipermail/spambayes/2002-November/001752.html
http://mail.python.org/pipermail/spambayes/2002-October/000997.html




More information about the Spambayes mailing list