[Spambayes] CLT run results

T. Alexander Popiel popiel@wolfskeep.com
Thu, 10 Oct 2002 09:07:57 -0700


Not much to say about this one.  The magic of the 2:3 ham:spam ratio
is maintained across default, clt1, clt2, and clt3.  This nudges me to
believe that it's something about my corpus or perhaps a universal
constant.  (Brad's posted results seemed to have high k at 2:3, too).
As others have shown, the clt total error rate is lower than that
of the default classifier, but the fp rate is higher.  I have not
yet looked at the certainty stuff that clt gives.

I used my modified timcv.py (posted earlier, and on my website)...
but if you want to reproduce my results, use Tim's version instead
(it makes more sense to have the training style as an ini option).
Just make sure to use the full retraining when doing clt tests.

I also retrieved my 10 set configuration (from before I rebalanced
for 15 sets in my last experiment).  Note that I did _not_ rebalance
to get back to this point; I untarred the archive I'd made.  This
means that the comparison against the 10set data from my ratio2
experiment might actually be valid.


The default (robinson) classifier results (from the ratio2 experiment):

-> <stat> tested 50 hams & 200 spams against 450 hams & 1800 spams
[... edited for brevity ...]
-> <stat> tested 200 hams & 50 spams against 1800 hams & 450 spams

ham-spam:   50-200  75-175 100-150 125-125 150-100  175-75  200-50
fp tot:          2       3       3       3       4       3       3
fp %:         0.40    0.40    0.30    0.24    0.27    0.17    0.15
fn tot:         32      41      43      43      47      48      51
fn %:         1.60    2.34    2.87    3.44    4.70    6.40   10.20
h mean:      24.25   21.75   20.12   18.87   18.33   17.72   16.71
h sdev:       7.52    7.13    7.04    7.09    7.16    7.31    7.43
s mean:      77.56   76.66   75.93   74.85   74.13   72.80   70.57
s sdev:       8.24    8.62    8.77    9.09    9.68    9.90   10.54
mean diff:   53.31   54.91   55.81   55.98   55.80   55.08   53.86
k:            3.38    3.49    3.53    3.46    3.31    3.20    3.00


clt1 results:

-> <stat> tested 50 hams & 200 spams against 450 hams & 1800 spams
[... edited for brevity ...]
-> <stat> tested 200 hams & 50 spams against 1800 hams & 450 spams

ham-spam:   50-200  75-175 100-150 125-125 150-100  175-75  200-50
fp tot:          9       4       6       6      10      10      11
fp %:         1.80    0.53    0.60    0.48    0.67    0.57    0.55
fn tot:          6       6       4       6       9      10      13
fn %:         0.30    0.34    0.27    0.48    0.90    1.33    2.60
h mean:       3.17    1.58    1.29    1.22    1.09    0.91    0.77
h sdev:      14.74    9.77    8.77    8.66    8.54    7.83    7.09
s mean:      99.55   99.32   99.18   98.85   98.22   97.88   96.42
s sdev:       5.66    6.68    7.06    7.96   10.00   11.57   14.67
mean diff:   96.38   97.74   97.89   97.63   97.13   96.97   95.65
k:            4.72    5.94    6.18    5.87    5.24    5.00    4.40


clt2 results:

-> <stat> tested 50 hams & 200 spams against 450 hams & 1800 spams
[... edited for brevity ...]
-> <stat> tested 200 hams & 50 spams against 1800 hams & 450 spams

ham-spam:   50-200  75-175 100-150 125-125 150-100  175-75  200-50
fp tot:         10       5       6       6       9      10       8
fp %:         2.00    0.67    0.60    0.48    0.60    0.57    0.40
fn tot:          6       6       4       6      11      14      16
fn %:         0.30    0.34    0.27    0.48    1.10    1.87    3.20
h mean:       3.37    1.39    0.89    0.68    0.57    0.57    0.47
h sdev:      15.03    9.31    8.28    7.56    7.17    7.15    6.20
s mean:      99.65   99.43   99.37   99.01   98.46   97.94   96.41
s sdev:       5.22    6.49    6.37    7.75    9.45   11.49   15.04
mean diff:   96.28   98.04   98.48   98.33   97.89   97.37   95.94
k:            4.75    6.21    6.72    6.42    5.89    5.22    4.52


clt3 results:

-> <stat> tested 50 hams & 200 spams against 450 hams & 1800 spams
[... edited for brevity ...]
-> <stat> tested 200 hams & 50 spams against 1800 hams & 450 spams

ham-spam:   50-200  75-175 100-150 125-125 150-100  175-75  200-50
fp tot:          9       4       5       6       8       9       8
fp %:         1.80    0.53    0.50    0.48    0.53    0.51    0.40
fn tot:          7       7       5      11      18      21      21
fn %:         0.35    0.40    0.33    0.88    1.80    2.80    4.20
h mean:       3.27    1.06    0.74    0.48    0.53    0.46    0.38
h sdev:      14.54    8.44    7.51    6.31    6.81    5.85    5.12
s mean:      99.58   99.35   99.18   98.61   97.81   97.07   95.11
s sdev:       5.78    6.80    7.30    8.89   11.15   13.34   17.31
mean diff:   96.31   98.29   98.44   98.13   97.28   96.61   94.73
k:            4.74    6.45    6.65    6.46    5.42    5.03    4.22

The clt variants all are sensitive to the ham:spam ratio in both
fp and fn, and the directions are crossed (which makes sense).
It's impossible to tell from the fp and fn numbers where the
sweet spot really is, but the k values seem to point at 2:3.

All of this is (of course) on my website at:

  http://www.wolfskeep.com/~popiel/spambayes/clt

- Alex