[Spambayes] ageing out database entries
Ryan Malayter
rmalayter at bai.org
Tue Nov 11 16:54:18 EST 2003
> From: Seth Goodman
> Subject: RE: [Spambayes] ageing out database entries
> [Seth Goodman]
> > > Is there an optimal total corpus size for training?
>
> [Ryan Malayter]
> > Not really, but evidence seems to suggest that a thousand
> or messages in
> > seems to work well. However, 10,000 or more messages seems
> to decrease
> > the capture rates somewhat.
I don't remember exactly where I read that rule-of-thumb, but I recall
it came from this list a wile back.
I did some searching through the list archives, and came up with a dew
messages that mentioned the 10,000 number as a threshold of sorts.
Perhaps there are just diminishing (or no) increases in accuracy above
this point, and accuracy doesn't actually get worse.
Here's the list:
http://mail.python.org/pipermail/spambayes/2002-November/001705.html
http://mail.python.org/pipermail/spambayes/2002-November/001752.html
http://mail.python.org/pipermail/spambayes/2002-October/000997.html
More information about the Spambayes
mailing list