[Spambayes] Perhaps a level header would be useful?

Meyer, Tony T.A.Meyer at massey.ac.nz
Tue Mar 11 19:07:27 EST 2003


[Bill Yerazunis]
> I've also had multiple requests for a continuous output match 
> parameter in
> CRM114, so I settled on this:
> 
>       pR = - (log (Pspam) - log (Pnonspam)
> 
> This goes from roughly +350 to -350, and (nicely) the uncertains 
> and errors all seem to group around +/- 100 . 

Curious, and (sort of) able to now run tests (thanks Tim & Mark), I changed the "prob = (S-H + 1.0) / 2.0" equation in classifier.py to use this method.  I had to also fiddle with 0's since log(0) isn't nice (how does CRM114 do this?), plus I moved it from -350to+350 to 0-1.  Surprisingly I got good (well, perfect, actually) results.  Is this just my tiny-weeny sets?  A fluke?  *Another* mistake on my part?

The change I made was to replace line 245 ("prob = (S-H + 1.0) / 2.0") of classifier.py with:
"""
            from math import log
            if H == 0:
                H = 0.00000001
            if S == 0:
                S = 0.00000001
            prob = ((-(log(S) - log(H)))/350) + 0.5
"""

pr_falses.txt -> pr_trues.txt
-> <stat> tested 333 hams & 56 spams against 372 hams & 48 spams
-> <stat> tested 329 hams & 48 spams against 372 hams & 48 spams
-> <stat> tested 321 hams & 51 spams against 372 hams & 48 spams
-> <stat> tested 372 hams & 48 spams against 333 hams & 56 spams
-> <stat> tested 329 hams & 48 spams against 333 hams & 56 spams
-> <stat> tested 321 hams & 51 spams against 333 hams & 56 spams
-> <stat> tested 372 hams & 48 spams against 329 hams & 48 spams
-> <stat> tested 333 hams & 56 spams against 329 hams & 48 spams
-> <stat> tested 321 hams & 51 spams against 329 hams & 48 spams
-> <stat> tested 372 hams & 48 spams against 321 hams & 51 spams
-> <stat> tested 333 hams & 56 spams against 321 hams & 51 spams
-> <stat> tested 329 hams & 48 spams against 321 hams & 51 spams
-> <stat> tested 333 hams & 56 spams against 372 hams & 48 spams
-> <stat> tested 329 hams & 48 spams against 372 hams & 48 spams
-> <stat> tested 321 hams & 51 spams against 372 hams & 48 spams
-> <stat> tested 372 hams & 48 spams against 333 hams & 56 spams
-> <stat> tested 329 hams & 48 spams against 333 hams & 56 spams
-> <stat> tested 321 hams & 51 spams against 333 hams & 56 spams
-> <stat> tested 372 hams & 48 spams against 329 hams & 48 spams
-> <stat> tested 333 hams & 56 spams against 329 hams & 48 spams
-> <stat> tested 321 hams & 51 spams against 329 hams & 48 spams
-> <stat> tested 372 hams & 48 spams against 321 hams & 51 spams
-> <stat> tested 333 hams & 56 spams against 321 hams & 51 spams
-> <stat> tested 329 hams & 48 spams against 321 hams & 51 spams

false positive percentages
    0.000  0.000  tied          
    0.000  0.000  tied          
    0.312  0.000  won   -100.00%
    0.000  0.000  tied          
    0.304  0.000  won   -100.00%
    0.935  0.000  won   -100.00%
    0.000  0.000  tied          
    0.000  0.000  tied          
    0.623  0.000  won   -100.00%
    0.000  0.000  tied          
    0.000  0.000  tied          
    0.000  0.000  tied          

won   4 times
tied  8 times
lost  0 times

total unique fp went from 4 to 0 won   -100.00%
mean fp % went from 0.181092520524 to 0.0 won   -100.00%

false negative percentages
    0.000  0.000  tied          
    2.083  0.000  won   -100.00%
    0.000  0.000  tied          
    2.083  0.000  won   -100.00%
    2.083  0.000  won   -100.00%
    0.000  0.000  tied          
    2.083  0.000  won   -100.00%
    0.000  0.000  tied          
    0.000  0.000  tied          
    6.250  0.000  won   -100.00%
    0.000  0.000  tied          
    4.167  0.000  won   -100.00%

won   6 times
tied  6 times
lost  0 times

total unique fn went from 5 to 0 won   -100.00%
mean fn % went from 1.5625 to 0.0 won   -100.00%

ham mean                     ham sdev
   3.64   55.82 +1433.52%       11.61    3.14  -72.95%
   3.68   55.64 +1411.96%       12.69    3.18  -74.94%
   2.84   55.75 +1863.03%       10.59    3.09  -70.82%
   2.08   56.10 +2597.12%        7.78    3.12  -59.90%

ham mean and sdev for all runs
   3.05   55.83 +1730.49%       10.83    3.14  -71.01%

spam mean                    spam sdev
  92.59   45.50  -50.86%       17.72    3.41  -80.76%
  94.02   44.72  -52.44%       16.04    3.48  -78.30%
  93.46   45.01  -51.84%       16.94    3.44  -79.69%
  87.89   45.01  -48.79%       22.86    3.88  -83.03%

spam mean and sdev for all runs
  91.98   45.07  -51.00%       18.75    3.57  -80.96%

ham/spam mean difference: 88.93 -10.76 -99.69

Comments?

=Tony Meyer



More information about the Spambayes mailing list