[Spambayes] There Can Be Only One
Tim Peters
tim.one@comcast.net
Wed, 25 Sep 2002 20:04:31 -0400
Here's an interesting experiment: max_discriminators=1. That is, only look
at the single strongest clue in a message. Unsurprisingly, this gives a
very Graham-like bipolar distribution. But it does surprisingly well for me
(max_discriminators 150 vs 1):
false positive percentages
0.500 1.500 lost +200.00%
0.000 0.000 tied
0.000 0.000 tied
0.000 0.500 lost +(was 0)
0.000 0.500 lost +(was 0)
0.000 0.000 tied
0.000 0.500 lost +(was 0)
0.000 0.000 tied
0.000 0.000 tied
0.000 0.500 lost +(was 0)
won 0 times
tied 5 times
lost 5 times
total unique fp went from 1 to 7 lost +600.00%
mean fp % went from 0.05 to 0.35 lost +600.00%
false negative percentages
0.000 0.000 tied
0.000 0.500 lost +(was 0)
0.000 0.000 tied
0.000 0.000 tied
0.000 0.000 tied
0.000 0.000 tied
0.000 1.000 lost +(was 0)
0.000 0.000 tied
0.000 0.500 lost +(was 0)
0.000 0.000 tied
won 0 times
tied 7 times
lost 3 times
total unique fn went from 0 to 4 lost +(was 0)
mean fn % went from 0.0 to 0.2 lost +(was 0)
ham mean ham sdev
33.01 1.57 -95.24% 6.26 12.13 +93.77%
32.19 0.05 -99.84% 5.38 0.16 -97.03%
32.99 0.04 -99.88% 5.60 0.11 -98.04%
33.46 0.54 -98.39% 5.77 7.03 +21.84%
33.16 0.57 -98.28% 5.56 7.02 +26.26%
32.81 0.06 -99.82% 5.72 0.15 -97.38%
33.38 0.55 -98.35% 5.76 7.02 +21.87%
32.55 0.07 -99.78% 5.70 0.35 -93.86%
33.11 0.07 -99.79% 5.52 0.25 -95.47%
34.21 0.55 -98.39% 5.84 7.01 +20.03%
ham mean and sdev for all runs
33.09 0.41 -98.76% 5.73 5.89 +2.79%
spam mean spam sdev
82.95 99.90 +20.43% 6.82 0.15 -97.80%
82.17 99.36 +20.92% 6.34 7.04 +11.04%
82.06 99.88 +21.72% 6.14 0.28 -95.44%
82.39 99.91 +21.26% 5.93 0.10 -98.31%
82.53 99.89 +21.03% 7.00 0.14 -98.00%
82.76 99.91 +20.72% 6.56 0.17 -97.41%
82.06 98.91 +20.53% 5.73 9.82 +71.38%
82.26 99.87 +21.41% 5.97 0.28 -95.31%
82.65 99.38 +20.24% 6.71 6.60 -1.64%
83.43 99.88 +19.72% 6.37 0.32 -94.98%
spam mean and sdev for all runs
82.53 99.69 +20.79% 6.37 4.37 -31.40%
ham/spam mean difference: 49.44 99.28 +49.84
The wild swings across runs in the ham and spam sdevs suggest it's not a
very stable approach <heh>.