[spambayes-dev] removing punctuation redux
Tony Meyer
tameyer at ihug.co.nz
Wed Oct 29 15:38:42 EST 2003
[...]
> Would someone please try this with their training database?
[...]
Only time for a quick test and no analysis, so someone else can do that.
Here are my results (following Skip's recipe).
stds.txt -> puncts.txt
-> <stat> tested 231 hams & 439 spams against 2079 hams & 3951 spams
-> <stat> tested 231 hams & 439 spams against 2079 hams & 3951 spams
-> <stat> tested 231 hams & 439 spams against 2079 hams & 3951 spams
-> <stat> tested 231 hams & 439 spams against 2079 hams & 3951 spams
-> <stat> tested 231 hams & 439 spams against 2079 hams & 3951 spams
-> <stat> tested 231 hams & 439 spams against 2079 hams & 3951 spams
-> <stat> tested 231 hams & 439 spams against 2079 hams & 3951 spams
-> <stat> tested 231 hams & 439 spams against 2079 hams & 3951 spams
-> <stat> tested 231 hams & 439 spams against 2079 hams & 3951 spams
-> <stat> tested 231 hams & 439 spams against 2079 hams & 3951 spams
-> <stat> tested 231 hams & 439 spams against 2079 hams & 3951 spams
-> <stat> tested 231 hams & 439 spams against 2079 hams & 3951 spams
-> <stat> tested 231 hams & 439 spams against 2079 hams & 3951 spams
-> <stat> tested 231 hams & 439 spams against 2079 hams & 3951 spams
-> <stat> tested 231 hams & 439 spams against 2079 hams & 3951 spams
-> <stat> tested 231 hams & 439 spams against 2079 hams & 3951 spams
-> <stat> tested 231 hams & 439 spams against 2079 hams & 3951 spams
-> <stat> tested 231 hams & 439 spams against 2079 hams & 3951 spams
-> <stat> tested 231 hams & 439 spams against 2079 hams & 3951 spams
-> <stat> tested 231 hams & 439 spams against 2079 hams & 3951 spams
false positive percentages
0.000 0.000 tied
0.000 0.000 tied
0.000 0.000 tied
0.000 0.000 tied
0.000 0.000 tied
0.433 0.433 tied
0.000 0.000 tied
0.000 0.000 tied
0.000 0.000 tied
0.000 0.000 tied
won 0 times
tied 10 times
lost 0 times
total unique fp went from 1 to 1 tied
mean fp % went from 0.04329004329 to 0.04329004329 tied
false negative percentages
2.506 2.506 tied
1.822 1.822 tied
1.822 1.822 tied
1.367 1.367 tied
2.733 2.506 won -8.31%
1.595 1.367 won -14.29%
1.822 1.822 tied
2.506 2.506 tied
1.822 1.822 tied
1.822 1.822 tied
won 2 times
tied 8 times
lost 0 times
total unique fn went from 87 to 85 won -2.30%
mean fn % went from 1.98177676537 to 1.93621867881 won -2.30%
ham mean ham sdev
0.10 0.10 +0.00% 1.03 1.04 +0.97%
0.04 0.03 -25.00% 0.24 0.22 -8.33%
0.29 0.29 +0.00% 3.04 3.04 +0.00%
0.20 0.20 +0.00% 2.64 2.68 +1.52%
0.35 0.35 +0.00% 3.88 3.88 +0.00%
0.71 0.70 -1.41% 6.48 6.47 -0.15%
0.79 0.79 +0.00% 6.91 6.92 +0.14%
0.27 0.27 +0.00% 3.30 3.30 +0.00%
0.32 0.30 -6.25% 3.28 3.03 -7.62%
0.42 0.41 -2.38% 5.28 5.35 +1.33%
ham mean and sdev for all runs
0.35 0.34 -2.86% 4.15 4.14 -0.24%
spam mean spam sdev
95.71 95.70 -0.01% 17.16 17.17 +0.06%
96.97 96.99 +0.02% 14.91 14.78 -0.87%
97.15 97.15 +0.00% 13.95 13.98 +0.22%
96.64 96.69 +0.05% 14.09 14.05 -0.28%
96.06 96.09 +0.03% 17.51 17.44 -0.40%
96.97 97.04 +0.07% 13.81 13.71 -0.72%
96.58 96.60 +0.02% 15.17 15.13 -0.26%
96.52 96.52 +0.00% 15.96 15.93 -0.19%
96.28 96.27 -0.01% 15.34 15.36 +0.13%
96.95 96.94 -0.01% 14.76 14.74 -0.14%
spam mean and sdev for all runs
96.58 96.60 +0.02% 15.32 15.28 -0.26%
ham/spam mean difference: 96.23 96.26 +0.03
=Tony Meyer
More information about the spambayes-dev
mailing list