[spambayes-dev] Results for DNS lookup in tokenizer

Tony Meyer tameyer at ihug.co.nz
Tue Apr 13 19:16:37 EDT 2004


> Here are my results using timcv.py -n5 with two corpora.  
> First cmp.py results, then a table.py with just running with 
> defaults as well.

And here are two more (they were running too slow to get out yesterday, but
completed overnight).

The first one is my non-work mail for the last few months; the second one is
the five sets that make up the SpamAssassin Public Archive (the bzip files
starting with 2003...).

Once again, the standard x-pick_apart_urls option does nothing (good or bad)
for me.  The SAPC one is just a loss, and the other is a more substantial
loss (although each win with one run).

-> <stat> tested 4692 hams & 386 spams against 18762 hams & 1537 spams
-> <stat> tested 4695 hams & 381 spams against 18759 hams & 1542 spams
-> <stat> tested 4693 hams & 383 spams against 18761 hams & 1540 spams
-> <stat> tested 4690 hams & 384 spams against 18764 hams & 1539 spams
-> <stat> tested 4684 hams & 389 spams against 18770 hams & 1534 spams
-> <stat> tested 4692 hams & 386 spams against 18762 hams & 1537 spams
-> <stat> tested 4695 hams & 381 spams against 18759 hams & 1542 spams
-> <stat> tested 4693 hams & 383 spams against 18761 hams & 1540 spams
-> <stat> tested 4690 hams & 384 spams against 18764 hams & 1539 spams
-> <stat> tested 4684 hams & 389 spams against 18770 hams & 1534 spams

false positive percentages
    0.000  0.000  tied
    0.021  0.021  tied
    0.000  0.000  tied
    0.000  0.000  tied
    0.000  0.000  tied

won   0 times
tied  5 times
lost  0 times

total unique fp went from 1 to 1 tied
mean fp % went from 0.00425985090522 to 0.00425985090522 tied

false negative percentages
    1.036  1.036  tied
    1.050  1.575  lost   +50.00%
    0.783  0.522  won    -33.33%
    1.823  2.083  lost   +14.26%
    1.285  1.799  lost   +40.00%

won   1 times
tied  1 times
lost  3 times

total unique fn went from 23 to 27 lost   +17.39%
mean fn % went from 1.19553834481 to 1.40321699713 lost   +17.37%

ham mean                     ham sdev
   0.09    0.10  +11.11%        1.73    1.72   -0.58%
   0.11    0.11   +0.00%        2.24    2.09   -6.70%
   0.12    0.12   +0.00%        2.05    2.05   +0.00%
   0.09    0.08  -11.11%        2.01    1.78  -11.44%
   0.04    0.05  +25.00%        0.88    1.19  +35.23%

ham mean and sdev for all runs
   0.09    0.09   +0.00%        1.85    1.80   -2.70%

spam mean                    spam sdev
  95.65   95.35   -0.31%       15.15   16.13   +6.47%
  95.77   95.20   -0.60%       15.18   16.83  +10.87%
  97.06   96.05   -1.04%       11.42   13.61  +19.18%
  95.32   94.61   -0.74%       16.75   18.41   +9.91%
  95.57   95.40   -0.18%       15.57   16.05   +3.08%

spam mean and sdev for all runs
  95.87   95.32   -0.57%       14.94   16.29   +9.04%

ham/spam mean difference: 95.78 95.23 -0.55

-> <stat> tested 830 hams & 380 spams against 3320 hams & 1517 spams
-> <stat> tested 830 hams & 380 spams against 3320 hams & 1517 spams
-> <stat> tested 830 hams & 379 spams against 3320 hams & 1518 spams
-> <stat> tested 830 hams & 379 spams against 3320 hams & 1518 spams
-> <stat> tested 830 hams & 379 spams against 3320 hams & 1518 spams
-> <stat> tested 830 hams & 380 spams against 3320 hams & 1517 spams
-> <stat> tested 830 hams & 380 spams against 3320 hams & 1517 spams
-> <stat> tested 830 hams & 379 spams against 3320 hams & 1518 spams
-> <stat> tested 830 hams & 379 spams against 3320 hams & 1518 spams
-> <stat> tested 830 hams & 379 spams against 3320 hams & 1518 spams

false positive percentages
    0.241  0.241  tied
    0.482  0.482  tied
    0.000  0.000  tied
    0.120  0.120  tied
    0.000  0.000  tied

won   0 times
tied  5 times
lost  0 times

total unique fp went from 7 to 7 tied
mean fp % went from 0.168674698795 to 0.168674698795 tied

false negative percentages
    0.789  1.053  lost   +33.46%
    0.526  0.526  tied
    0.528  0.264  won    -50.00%
    0.264  0.264  tied
    1.055  1.319  lost   +25.02%

won   1 times
tied  2 times
lost  2 times

total unique fn went from 12 to 13 lost    +8.33%
mean fn % went from 0.632551034579 to 0.685182613526 lost    +8.32%

ham mean                     ham sdev
   0.67    0.61   -8.96%        6.87    6.56   -4.51%
   0.95    0.85  -10.53%        8.69    8.08   -7.02%
   0.87    0.81   -6.90%        7.10    6.79   -4.37%
   0.60    0.57   -5.00%        6.64    6.49   -2.26%
   0.48    0.42  -12.50%        4.87    4.62   -5.13%

ham mean and sdev for all runs
   0.71    0.65   -8.45%        6.94    6.60   -4.90%

spam mean                    spam sdev
  97.13   96.89   -0.25%       12.08   13.00   +7.62%
  98.59   98.50   -0.09%        8.09    8.49   +4.94%
  98.57   98.44   -0.13%        8.03    8.15   +1.49%
  98.59   98.54   -0.05%        7.51    7.68   +2.26%
  97.91   97.72   -0.19%       11.50   12.22   +6.26%

spam mean and sdev for all runs
  98.16   98.02   -0.14%        9.66   10.18   +5.38%

ham/spam mean difference: 97.45 97.37 -0.08

filename:        ihugs  ihug_picks ihug_pickms
ham:spam:   23454:1923  23454:1923  23454:1923
fp total:            1           1           1
fp %:             0.00        0.00        0.00
fn total:           23          23          27
fn %:             1.20        1.20        1.40
unsure t:          169         171         176
unsure %:         0.67        0.67        0.69
real cost:      $66.80      $67.20      $72.20
best cost:      $57.00      $56.60      $62.40
h mean:           0.09        0.09        0.09
h sdev:           1.89        1.85        1.80
s mean:          95.86       95.87       95.32
s sdev:          14.99       14.94       16.29
mean diff:       95.77       95.78       95.23
k:                5.67        5.70        5.26

filename:        sapcs  sapc_picks sapc_pickms
ham:spam:    4150:1897   4150:1897   4150:1897
fp total:            7           7           7
fp %:             0.17        0.17        0.17
fn total:           12          12          13
fn %:             0.63        0.63        0.69
unsure t:           99          99         100
unsure %:         1.64        1.64        1.65
real cost:     $101.80     $101.80     $103.00
best cost:      $70.60      $70.20      $70.80
h mean:           0.71        0.71        0.65
h sdev:           6.92        6.94        6.60
s mean:          98.14       98.16       98.02
s sdev:           9.72        9.66       10.18
mean diff:       97.43       97.45       97.37
k:                5.86        5.87        5.80




More information about the spambayes-dev mailing list