[Spambayes] train-to-exhaustion questions
David Abrahams
dave at boost-consulting.com
Thu Apr 26 06:13:29 CEST 2007
1. A recent training run went like this:
round: 1, msgs: 690, ham misses: 61, spam misses: 210, 176.3s
round: 2, msgs: 690, ham misses: 8, spam misses: 53, 165.6s
round: 3, msgs: 690, ham misses: 1, spam misses: 7, 159.6s
round: 4, msgs: 690, ham misses: 1, spam misses: 2, 159.6s
round: 5, msgs: 690, ham misses: 0, spam misses: 1, 157.8s
round: 6, msgs: 690, ham misses: 1, spam misses: 1, 160.9s
round: 7, msgs: 690, ham misses: 0, spam misses: 1, 211.0s
round: 8, msgs: 690, ham misses: 0, spam misses: 1, 172.6s
round: 9, msgs: 690, ham misses: 0, spam misses: 1, 197.1s
round: 10, msgs: 690, ham misses: 1, spam misses: 1, 174.6s
It seems that the results got *worse* in rounds 6 and 10. Am I
misinterpreting this? Are these expected results?
2. I have about 350 each of ham and spam that I can use to train on.
I'm sure that some of these messages are mostly redundant and add
little or nothing of value to the training data. I don't want to
waste time on them every time I do a training run. Is there some
way to use tte.py to reduce my training set to the messages that
actually make a difference?
Thanks in advance!
--
Dave Abrahams
Boost Consulting
http://www.boost-consulting.com
Don't Miss BoostCon 2007! ==> http://www.boostcon.com
More information about the SpamBayes
mailing list