[Spambayes] max word size

T. Alexander Popiel popiel@wolfskeep.com
Tue Oct 29 18:54:22 2002


Changing the max word size (for generating skip tokens)
doesn't seem to have much effect on my data.

Have table... it pretty much says it all.

-> <stat> tested 200 hams & 200 spams against 1800 hams & 1800 spams
[...]
filename:   skip10  skip11  skip12  skip13  skip14  skip20  skip50
ham:spam:  2000:2000       2000:2000       2000:2000       2000:2000
                   2000:2000       2000:2000       2000:2000      
fp total:        4       3       3       3       3       4       4
fp %:         0.20    0.15    0.15    0.15    0.15    0.20    0.20
fn total:       12      10      12      11      12      10      10
fn %:         0.60    0.50    0.60    0.55    0.60    0.50    0.50
unsure t:       52      55      53      55      53      52      54
unsure %:     1.30    1.38    1.32    1.38    1.32    1.30    1.35
real cost:  $62.40  $51.00  $52.60  $52.00  $52.60  $60.40  $60.80
best cost:  $49.20  $49.00  $48.20  $48.40  $48.40  $49.40  $50.00
h mean:       0.42    0.41    0.40    0.40    0.38    0.39    0.39
h sdev:       5.47    5.42    5.39    5.35    5.22    5.30    5.22
s mean:      98.44   98.45   98.45   98.46   98.46   98.48   98.48
s sdev:       9.87    9.79    9.76    9.72    9.75    9.71    9.69
mean diff:   98.02   98.04   98.05   98.06   98.08   98.09   98.09
k:            6.39    6.45    6.47    6.51    6.55    6.53    6.58

It doesn't look like there's any significance in there, even with
the extreme sizes...

- Alex