[Spambayes] testing results
Neil Schemenauer
nas@python.ca
Sun, 8 Sep 2002 18:20:51 -0700
Tim Peters wrote:
> If you've still got the summary files, please cvs up and try running cmp.py
> again -- in the process of generalizing cmp.py, you managed to make it skip
> half the lines <wink>.
Woops. I didn't have the summary files so I regenerated them using a
slightly different set of data. Here are the results of enabling the
"received" header processing:
false positive percentages
0.707 0.530 won -25.04%
0.873 0.524 won -39.98%
0.301 0.301 tied
1.047 1.047 tied
0.602 0.452 won -24.92%
0.353 0.177 won -49.86%
won 4 times
tied 2 times
lost 0 times
total unique fp went from 17 to 14 won -17.65%
false negative percentages
2.167 1.238 won -42.87%
0.969 0.969 tied
1.887 1.372 won -27.29%
1.616 1.292 won -20.05%
1.029 0.858 won -16.62%
1.548 1.548 tied
won 4 times
tied 2 times
lost 0 times
total unique fn went from 50 to 38 won -24.00%
My test set is different than Tim's in that all the email was received
by the same account. Also, my set contains email sent to me, not to
mailing lists (I use a different addresses for mailing lists). If
people cook up more ideas I will be happy to test them.
Neil