[spambayes-dev] [ 830290 ] url detection

Tony Meyer ta-meyer at ihug.co.nz
Wed Jan 7 18:46:57 EST 2004


With Skip's latest patch he observed no change (see the tracker
<https://sourceforge.net/tracker/?func=detail&atid=498105&aid=830290&group_i
d=61702>).

Here are my results - with bigrams it's a slight loss, without, it's a
slight win.

-> <stat> tested 357 hams & 395 spams against 3311 hams & 3704 spams
-> <stat> tested 397 hams & 384 spams against 3271 hams & 3715 spams
-> <stat> tested 385 hams & 433 spams against 3283 hams & 3666 spams
-> <stat> tested 407 hams & 397 spams against 3261 hams & 3702 spams
-> <stat> tested 350 hams & 412 spams against 3318 hams & 3687 spams
-> <stat> tested 338 hams & 405 spams against 3330 hams & 3694 spams
-> <stat> tested 359 hams & 416 spams against 3309 hams & 3683 spams
-> <stat> tested 358 hams & 405 spams against 3310 hams & 3694 spams
-> <stat> tested 348 hams & 411 spams against 3320 hams & 3688 spams
-> <stat> tested 369 hams & 441 spams against 3299 hams & 3658 spams
-> <stat> tested 357 hams & 395 spams against 3311 hams & 3704 spams
-> <stat> tested 397 hams & 384 spams against 3271 hams & 3715 spams
-> <stat> tested 385 hams & 433 spams against 3283 hams & 3666 spams
-> <stat> tested 407 hams & 397 spams against 3261 hams & 3702 spams
-> <stat> tested 350 hams & 412 spams against 3318 hams & 3687 spams
-> <stat> tested 338 hams & 405 spams against 3330 hams & 3694 spams
-> <stat> tested 359 hams & 416 spams against 3309 hams & 3683 spams
-> <stat> tested 358 hams & 405 spams against 3310 hams & 3694 spams
-> <stat> tested 348 hams & 411 spams against 3320 hams & 3688 spams
-> <stat> tested 369 hams & 441 spams against 3299 hams & 3658 spams

false positive percentages
    0.000  0.000  tied
    0.000  0.000  tied
    0.000  0.000  tied
    0.246  0.246  tied
    0.000  0.000  tied
    0.000  0.000  tied
    0.557  0.557  tied
    0.559  0.559  tied
    0.287  0.287  tied
    0.000  0.000  tied

won   0 times
tied 10 times
lost  0 times

total unique fp went from 6 to 6 tied
mean fp % went from 0.164881884948 to 0.164881884948 tied

false negative percentages
    0.253  0.253  tied
    0.781  0.781  tied
    0.462  0.462  tied
    0.756  0.756  tied
    0.243  0.243  tied
    0.247  0.494  lost  +100.00%
    0.240  0.240  tied
    0.494  0.494  tied
    0.973  0.973  tied
    0.454  0.454  tied

won   0 times
tied  9 times
lost  1 times

total unique fn went from 20 to 21 lost    +5.00%
mean fn % went from 0.490257037938 to 0.514948395963 lost    +5.04%

ham mean                     ham sdev
   1.18    1.18   +0.00%        7.76    7.70   -0.77%
   0.99    0.98   -1.01%        6.64    6.59   -0.75%
   0.84    0.85   +1.19%        6.14    6.14   +0.00%
   1.99    1.95   -2.01%        9.46    9.27   -2.01%
   0.49    0.49   +0.00%        3.59    3.57   -0.56%
   0.85    0.84   -1.18%        5.45    5.42   -0.55%
   1.16    1.16   +0.00%        9.30    9.29   -0.11%
   1.20    1.20   +0.00%        8.13    8.11   -0.25%
   1.55    1.50   -3.23%        8.05    7.89   -1.99%
   0.47    0.46   -2.13%        3.22    3.10   -3.73%

ham mean and sdev for all runs
   1.08    1.07   -0.93%        7.13    7.06   -0.98%

spam mean                    spam sdev
  98.75   98.75   +0.00%        8.72    8.73   +0.11%
  97.67   97.68   +0.01%       11.26   11.19   -0.62%
  98.08   98.10   +0.02%       10.12   10.13   +0.10%
  98.16   98.15   -0.01%       10.19   10.20   +0.10%
  98.35   98.37   +0.02%        8.77    8.75   -0.23%
  98.45   98.44   -0.01%        8.97    9.03   +0.67%
  98.35   98.36   +0.01%        9.73    9.69   -0.41%
  98.25   98.32   +0.07%        9.16    9.01   -1.64%
  97.93   97.95   +0.02%       11.99   11.95   -0.33%
  98.92   98.94   +0.02%        7.62    7.63   +0.13%

spam mean and sdev for all runs
  98.30   98.31   +0.01%        9.72    9.69   -0.31%

ham/spam mean difference: 97.22 97.24 +0.02

And with bigrams:

[same stat lines as above snipped]

false positive percentages
    0.000  0.000  tied
    0.000  0.000  tied
    0.000  0.000  tied
    0.000  0.000  tied
    0.000  0.000  tied
    0.000  0.000  tied
    0.279  0.279  tied
    0.000  0.000  tied
    0.000  0.000  tied
    0.000  0.000  tied

won   0 times
tied 10 times
lost  0 times

total unique fp went from 1 to 1 tied
mean fp % went from 0.0278551532033 to 0.0278551532033 tied

false negative percentages
    0.253  0.253  tied
    1.042  1.042  tied
    0.693  0.693  tied
    0.252  0.252  tied
    0.728  0.485  won    -33.38%
    0.000  0.000  tied
    0.481  0.481  tied
    0.494  0.494  tied
    0.730  0.730  tied
    0.227  0.227  tied

won   1 times
tied  9 times
lost  0 times

total unique fn went from 20 to 19 won     -5.00%
mean fn % went from 0.489899714703 to 0.465627870043 won     -4.95%

ham mean                     ham sdev
   0.95    0.94   -1.05%        6.64    6.60   -0.60%
   0.83    0.82   -1.20%        5.53    5.50   -0.54%
   0.49    0.49   +0.00%        4.08    4.08   +0.00%
   1.53    1.51   -1.31%        8.16    8.04   -1.47%
   0.30    0.30   +0.00%        3.25    3.25   +0.00%
   0.70    0.70   +0.00%        5.27    5.28   +0.19%
   0.85    0.83   -2.35%        7.11    7.11   +0.00%
   0.93    0.92   -1.08%        7.23    7.19   -0.55%
   0.90    0.88   -2.22%        6.47    6.44   -0.46%
   0.41    0.41   +0.00%        4.07    4.03   -0.98%

ham mean and sdev for all runs
   0.80    0.79   -1.25%        6.01    5.97   -0.67%

spam mean                    spam sdev
  98.71   98.72   +0.01%        7.83    7.81   -0.26%
  97.38   97.39   +0.01%       12.55   12.49   -0.48%
  97.78   97.78   +0.00%       11.09   11.10   +0.09%
  97.89   97.88   -0.01%       10.49   10.50   +0.10%
  97.90   97.93   +0.03%       10.03   10.02   -0.10%
  98.32   98.34   +0.02%        8.63    8.57   -0.70%
  98.19   98.20   +0.01%       10.21   10.20   -0.10%
  97.68   97.77   +0.09%       10.99   10.72   -2.46%
  97.86   97.87   +0.01%       11.56   11.53   -0.26%
  98.73   98.75   +0.02%        7.57    7.57   +0.00%

spam mean and sdev for all runs
  98.05   98.07   +0.02%       10.20   10.15   -0.49%

ham/spam mean difference: 97.25 97.28 +0.03

And a table.py for the unsures:

filename:        bases  fancy_urls     basebis fancy_url_bis
ham:spam:    3668:4099   3668:4099   3668:4099   3668:4099
fp total:            6           6           1           1
fp %:             0.16        0.16        0.03        0.03
fn total:           20          21          20          19
fn %:             0.49        0.51        0.49        0.46
unsure t:          178         176         207         207
unsure %:         2.29        2.27        2.67        2.67
real cost:     $115.60     $116.20      $71.40      $70.40
best cost:      $93.00      $92.20      $65.60      $63.80
h mean:           1.08        1.07        0.80        0.79
h sdev:           7.13        7.06        6.01        5.97
s mean:          98.30       98.31       98.05       98.07
s sdev:           9.72        9.69       10.20       10.15
mean diff:       97.22       97.24       97.25       97.28
k:                5.77        5.81        6.00        6.03

=Tony Meyer




More information about the spambayes-dev mailing list