[Spambayes] Perhaps a level header would be useful?
Meyer, Tony
T.A.Meyer at massey.ac.nz
Tue Mar 11 19:07:27 EST 2003
[Bill Yerazunis]
> I've also had multiple requests for a continuous output match
> parameter in
> CRM114, so I settled on this:
>
> pR = - (log (Pspam) - log (Pnonspam)
>
> This goes from roughly +350 to -350, and (nicely) the uncertains
> and errors all seem to group around +/- 100 .
Curious, and (sort of) able to now run tests (thanks Tim & Mark), I changed the "prob = (S-H + 1.0) / 2.0" equation in classifier.py to use this method. I had to also fiddle with 0's since log(0) isn't nice (how does CRM114 do this?), plus I moved it from -350to+350 to 0-1. Surprisingly I got good (well, perfect, actually) results. Is this just my tiny-weeny sets? A fluke? *Another* mistake on my part?
The change I made was to replace line 245 ("prob = (S-H + 1.0) / 2.0") of classifier.py with:
"""
from math import log
if H == 0:
H = 0.00000001
if S == 0:
S = 0.00000001
prob = ((-(log(S) - log(H)))/350) + 0.5
"""
pr_falses.txt -> pr_trues.txt
-> <stat> tested 333 hams & 56 spams against 372 hams & 48 spams
-> <stat> tested 329 hams & 48 spams against 372 hams & 48 spams
-> <stat> tested 321 hams & 51 spams against 372 hams & 48 spams
-> <stat> tested 372 hams & 48 spams against 333 hams & 56 spams
-> <stat> tested 329 hams & 48 spams against 333 hams & 56 spams
-> <stat> tested 321 hams & 51 spams against 333 hams & 56 spams
-> <stat> tested 372 hams & 48 spams against 329 hams & 48 spams
-> <stat> tested 333 hams & 56 spams against 329 hams & 48 spams
-> <stat> tested 321 hams & 51 spams against 329 hams & 48 spams
-> <stat> tested 372 hams & 48 spams against 321 hams & 51 spams
-> <stat> tested 333 hams & 56 spams against 321 hams & 51 spams
-> <stat> tested 329 hams & 48 spams against 321 hams & 51 spams
-> <stat> tested 333 hams & 56 spams against 372 hams & 48 spams
-> <stat> tested 329 hams & 48 spams against 372 hams & 48 spams
-> <stat> tested 321 hams & 51 spams against 372 hams & 48 spams
-> <stat> tested 372 hams & 48 spams against 333 hams & 56 spams
-> <stat> tested 329 hams & 48 spams against 333 hams & 56 spams
-> <stat> tested 321 hams & 51 spams against 333 hams & 56 spams
-> <stat> tested 372 hams & 48 spams against 329 hams & 48 spams
-> <stat> tested 333 hams & 56 spams against 329 hams & 48 spams
-> <stat> tested 321 hams & 51 spams against 329 hams & 48 spams
-> <stat> tested 372 hams & 48 spams against 321 hams & 51 spams
-> <stat> tested 333 hams & 56 spams against 321 hams & 51 spams
-> <stat> tested 329 hams & 48 spams against 321 hams & 51 spams
false positive percentages
0.000 0.000 tied
0.000 0.000 tied
0.312 0.000 won -100.00%
0.000 0.000 tied
0.304 0.000 won -100.00%
0.935 0.000 won -100.00%
0.000 0.000 tied
0.000 0.000 tied
0.623 0.000 won -100.00%
0.000 0.000 tied
0.000 0.000 tied
0.000 0.000 tied
won 4 times
tied 8 times
lost 0 times
total unique fp went from 4 to 0 won -100.00%
mean fp % went from 0.181092520524 to 0.0 won -100.00%
false negative percentages
0.000 0.000 tied
2.083 0.000 won -100.00%
0.000 0.000 tied
2.083 0.000 won -100.00%
2.083 0.000 won -100.00%
0.000 0.000 tied
2.083 0.000 won -100.00%
0.000 0.000 tied
0.000 0.000 tied
6.250 0.000 won -100.00%
0.000 0.000 tied
4.167 0.000 won -100.00%
won 6 times
tied 6 times
lost 0 times
total unique fn went from 5 to 0 won -100.00%
mean fn % went from 1.5625 to 0.0 won -100.00%
ham mean ham sdev
3.64 55.82 +1433.52% 11.61 3.14 -72.95%
3.68 55.64 +1411.96% 12.69 3.18 -74.94%
2.84 55.75 +1863.03% 10.59 3.09 -70.82%
2.08 56.10 +2597.12% 7.78 3.12 -59.90%
ham mean and sdev for all runs
3.05 55.83 +1730.49% 10.83 3.14 -71.01%
spam mean spam sdev
92.59 45.50 -50.86% 17.72 3.41 -80.76%
94.02 44.72 -52.44% 16.04 3.48 -78.30%
93.46 45.01 -51.84% 16.94 3.44 -79.69%
87.89 45.01 -48.79% 22.86 3.88 -83.03%
spam mean and sdev for all runs
91.98 45.07 -51.00% 18.75 3.57 -80.96%
ham/spam mean difference: 88.93 -10.76 -99.69
Comments?
=Tony Meyer
More information about the Spambayes
mailing list