[spambayes-dev] auto-training w/ small db seems like a bad idea

Skip Montanaro skip at pobox.com
Tue Jan 13 09:40:42 EST 2004


This is completely anecdotal, but my recent experience with automatic
training in the context of a small training database (perhaps fewer than
30-50 each of hams and spams) suggests that this is not a good idea.  I've
been trying a regime of training on fp/fn/unsure and using non-edge
auto-training (where the "edge" is at 0.01 and 0.99).  In my case at least,
I don't use a slick interface for untraining/retraining misclassified
messages, so it's even more trouble than if I did.  With a small training
database, a couple mistakes can really screw things up.

Skip




More information about the spambayes-dev mailing list