[spambayes-dev] Evaluating a training corpus

Tim Peters tim.one at comcast.net
Sun Jun 8 19:05:55 EDT 2003


[Meyer, Tony]
> Thanks for this.  Timtest is probably closer to my real-life usage
> since I keep a pretty small Outlook database (going by the 'if it
> works, why make it bigger' theory).

Same here:  I use 3 distinct Outlook 2000s regularly, they all have
databases with about 1000 msgs in them, and I rarely bother to train any of
them anymore.

> In terms of posting any results to the list, does it matter which
> poison is chosen?  I've historically used timtest, but only because
> that was the one used in the example 'how to test' that Mark posted a
> while back.

We're still working on the meaning of "preferred" here <wink>?  timcv is
faster, its results are easier to interpret, and it's generally more
realistic given the relatively small amount of data most people have to
throw at it.  Best to view timtest as a tool for extreme testing by extreme
researchers with extreme needs.




More information about the spambayes-dev mailing list