[Spambayes] On counting words more than once
Neil Schemenauer
nas@python.ca
Sat, 28 Sep 2002 11:49:16 -0700
Tim Peters wrote:
> Testers? You can try this by enabling the new option
>
> [Classifier]
> count_duplicates_only_once_in_training: True
It's a win for me:
false positive percentages
0.000 0.000 tied
1.000 1.000 tied
0.000 0.000 tied
0.000 0.000 tied
0.500 0.500 tied
0.500 1.000 lost +100.00%
0.500 0.500 tied
0.000 0.000 tied
0.500 0.500 tied
0.000 0.500 lost +(was 0)
won 0 times
tied 8 times
lost 2 times
total unique fp went from 6 to 8 lost +33.33%
mean fp % went from 0.3 to 0.4 lost +33.33%
false negative percentages
0.000 0.000 tied
1.000 0.500 won -50.00%
1.000 0.500 won -50.00%
0.500 0.500 tied
2.000 1.500 won -25.00%
1.500 1.000 won -33.33%
0.000 0.000 tied
0.500 0.000 won -100.00%
0.500 0.000 won -100.00%
0.000 0.000 tied
won 6 times
tied 4 times
lost 0 times
total unique fn went from 14 to 8 won -42.86%
mean fn % went from 0.7 to 0.4 won -42.86%
ham mean ham sdev
30.01 27.92 -6.96% 8.43 8.42 -0.12%
28.50 26.74 -6.18% 8.83 8.69 -1.59%
27.93 26.04 -6.77% 8.20 7.94 -3.17%
29.55 27.33 -7.51% 8.24 8.23 -0.12%
29.05 27.19 -6.40% 8.28 8.15 -1.57%
31.40 29.48 -6.11% 9.41 9.25 -1.70%
29.31 27.49 -6.21% 8.13 8.10 -0.37%
29.33 27.16 -7.40% 7.86 7.89 +0.38%
28.72 27.22 -5.22% 9.05 8.97 -0.88%
29.04 26.87 -7.47% 7.28 7.22 -0.82%
ham mean and sdev for all runs
29.28 27.34 -6.63% 8.44 8.35 -1.07%
spam mean spam sdev
82.98 81.91 -1.29% 9.83 10.16 +3.36%
82.02 81.04 -1.19% 9.92 10.09 +1.71%
81.19 80.28 -1.12% 9.69 9.86 +1.75%
82.51 81.66 -1.03% 9.92 10.23 +3.13%
82.60 81.60 -1.21% 10.12 10.33 +2.08%
82.24 81.36 -1.07% 9.25 9.71 +4.97%
81.74 80.85 -1.09% 9.30 9.49 +2.04%
81.70 80.64 -1.30% 9.51 9.81 +3.15%
82.39 81.45 -1.14% 9.87 10.18 +3.14%
82.44 81.45 -1.20% 9.49 9.73 +2.53%
spam mean and sdev for all runs
82.18 81.22 -1.17% 9.71 9.97 +2.68%
ham/spam mean difference: 52.90 53.88 +0.98