[Spambayes] experimental_ham_spam_imbalance_adjustment result

Mark Hammond mhammond at skippinet.com.au
Mon Mar 10 10:42:43 EST 2003


Here are my current results on the imbalance option.  Interestingly, my
initial "-n2" results looked better than my "-n 5" results below.

FWIW, Outlook users should remember there is an "export.py" script in the
addin directory.  This will export your ham and spam into the
"spambayes\testdata\Data" directory, which is the default place the test
scripts the test tools use.  Just run this from the command line.

And for everyone else, once you have a "Data" directory, running the tests
means:

* Create testtools\bayescustomize.ini with the options you want to test
* run "timtest.py -n 2 > result1.txt"
* run "rates result1.txt"  - this creates "result1s.txt"
* Repeat the above, changing the options, and redirecting to
  "result2.txt" and getting "result2s.txt" as final output.
* Run "cmp.py result1s.txt result2s.txt"

Well - if it *doesn't* mean that, then you can ignore my results too <wink>.
My results below are for "-n 5".

Mark.

\temp\imbalance_falses.txt -> \temp\imbalance_trues.txt
-> <stat> tested 412 hams & 1004 spams against 429 hams & 1019 spams
-> <stat> tested 440 hams & 1076 spams against 429 hams & 1019 spams
-> <stat> tested 397 hams & 1054 spams against 429 hams & 1019 spams
-> <stat> tested 477 hams & 1056 spams against 429 hams & 1019 spams
-> <stat> tested 429 hams & 1019 spams against 412 hams & 1004 spams
-> <stat> tested 440 hams & 1076 spams against 412 hams & 1004 spams
-> <stat> tested 397 hams & 1054 spams against 412 hams & 1004 spams
-> <stat> tested 477 hams & 1056 spams against 412 hams & 1004 spams
-> <stat> tested 429 hams & 1019 spams against 440 hams & 1076 spams
-> <stat> tested 412 hams & 1004 spams against 440 hams & 1076 spams
-> <stat> tested 397 hams & 1054 spams against 440 hams & 1076 spams
-> <stat> tested 477 hams & 1056 spams against 440 hams & 1076 spams
-> <stat> tested 429 hams & 1019 spams against 397 hams & 1054 spams
-> <stat> tested 412 hams & 1004 spams against 397 hams & 1054 spams
-> <stat> tested 440 hams & 1076 spams against 397 hams & 1054 spams
-> <stat> tested 477 hams & 1056 spams against 397 hams & 1054 spams
-> <stat> tested 429 hams & 1019 spams against 477 hams & 1056 spams
-> <stat> tested 412 hams & 1004 spams against 477 hams & 1056 spams
-> <stat> tested 440 hams & 1076 spams against 477 hams & 1056 spams
-> <stat> tested 397 hams & 1054 spams against 477 hams & 1056 spams
-> <stat> tested 412 hams & 1004 spams against 429 hams & 1019 spams
-> <stat> tested 440 hams & 1076 spams against 429 hams & 1019 spams
-> <stat> tested 397 hams & 1054 spams against 429 hams & 1019 spams
-> <stat> tested 477 hams & 1056 spams against 429 hams & 1019 spams
-> <stat> tested 429 hams & 1019 spams against 412 hams & 1004 spams
-> <stat> tested 440 hams & 1076 spams against 412 hams & 1004 spams
-> <stat> tested 397 hams & 1054 spams against 412 hams & 1004 spams
-> <stat> tested 477 hams & 1056 spams against 412 hams & 1004 spams
-> <stat> tested 429 hams & 1019 spams against 440 hams & 1076 spams
-> <stat> tested 412 hams & 1004 spams against 440 hams & 1076 spams
-> <stat> tested 397 hams & 1054 spams against 440 hams & 1076 spams
-> <stat> tested 477 hams & 1056 spams against 440 hams & 1076 spams
-> <stat> tested 429 hams & 1019 spams against 397 hams & 1054 spams
-> <stat> tested 412 hams & 1004 spams against 397 hams & 1054 spams
-> <stat> tested 440 hams & 1076 spams against 397 hams & 1054 spams
-> <stat> tested 477 hams & 1056 spams against 397 hams & 1054 spams
-> <stat> tested 429 hams & 1019 spams against 477 hams & 1056 spams
-> <stat> tested 412 hams & 1004 spams against 477 hams & 1056 spams
-> <stat> tested 440 hams & 1076 spams against 477 hams & 1056 spams
-> <stat> tested 397 hams & 1054 spams against 477 hams & 1056 spams

false positive percentages
    1.699  1.214  won    -28.55%
    0.909  0.682  won    -24.97%
    1.008  0.756  won    -25.00%
    0.210  0.210  tied
    0.932  0.699  won    -25.00%
    0.682  0.227  won    -66.72%
    1.008  0.504  won    -50.00%
    0.000  0.000  tied
    0.466  0.233  won    -50.00%
    0.243  0.243  tied
    1.259  0.504  won    -59.97%
    0.210  0.000  won   -100.00%
    0.699  0.466  won    -33.33%
    1.456  0.728  won    -50.00%
    1.818  1.591  won    -12.49%
    0.839  0.210  won    -74.97%
    0.466  0.233  won    -50.00%
    0.728  0.485  won    -33.38%
    0.455  0.227  won    -50.11%
    1.259  0.756  won    -39.95%

won  17 times
tied  3 times
lost  0 times

total unique fp went from 40 to 26 won    -35.00%
mean fp % went from 0.817290959648 to 0.49835280855 won    -39.02%

false negative percentages
    0.398  0.498  lost   +25.13%
    0.093  0.186  lost  +100.00%
    0.380  0.474  lost   +24.74%
    0.189  0.189  tied
    0.294  0.294  tied
    0.000  0.372  lost  +(was 0)
    0.190  0.285  lost   +50.00%
    0.379  0.568  lost   +49.87%
    0.491  0.883  lost   +79.84%
    0.896  1.195  lost   +33.37%
    0.664  1.139  lost   +71.54%
    0.189  0.379  lost  +100.53%
    0.294  0.393  lost   +33.67%
    0.498  0.697  lost   +39.96%
    0.093  0.093  tied
    0.189  0.379  lost  +100.53%
    0.687  1.374  lost  +100.00%
    1.195  1.295  lost    +8.37%
    0.651  0.929  lost   +42.70%
    0.474  0.664  lost   +40.08%

won   0 times
tied  3 times
lost 17 times

total unique fn went from 44 to 66 lost   +50.00%
mean fn % went from 0.412283315133 to 0.614303438288 lost   +49.00%

ham mean                     ham sdev
   3.82    2.97  -22.25%       14.77   12.72  -13.88%
   3.31    2.42  -26.89%       13.21   10.90  -17.49%
   3.57    2.69  -24.65%       13.66   11.26  -17.57%
   4.26    3.20  -24.88%       15.98   13.14  -17.77%
   3.37    2.65  -21.36%       14.07   12.03  -14.50%

ham mean and sdev for all runs
   3.67    2.79  -23.98%       14.38   12.05  -16.20%

spam mean                    spam sdev
  98.10   96.94   -1.18%        8.44   10.32  +22.27%
  97.83   96.47   -1.39%        9.10   11.49  +26.26%
  97.63   96.24   -1.42%       10.29   12.59  +22.35%
  98.13   96.83   -1.32%        8.23   10.58  +28.55%
  96.93   95.49   -1.49%       11.94   14.13  +18.34%

spam mean and sdev for all runs
  97.72   96.40   -1.35%        9.71   11.91  +22.66%

ham/spam mean difference: 94.05 93.61 -0.44




More information about the Spambayes mailing list