[Spambayes] max word size
T. Alexander Popiel
popiel@wolfskeep.com
Tue Oct 29 18:54:22 2002
Changing the max word size (for generating skip tokens)
doesn't seem to have much effect on my data.
Have table... it pretty much says it all.
-> <stat> tested 200 hams & 200 spams against 1800 hams & 1800 spams
[...]
filename: skip10 skip11 skip12 skip13 skip14 skip20 skip50
ham:spam: 2000:2000 2000:2000 2000:2000 2000:2000
2000:2000 2000:2000 2000:2000
fp total: 4 3 3 3 3 4 4
fp %: 0.20 0.15 0.15 0.15 0.15 0.20 0.20
fn total: 12 10 12 11 12 10 10
fn %: 0.60 0.50 0.60 0.55 0.60 0.50 0.50
unsure t: 52 55 53 55 53 52 54
unsure %: 1.30 1.38 1.32 1.38 1.32 1.30 1.35
real cost: $62.40 $51.00 $52.60 $52.00 $52.60 $60.40 $60.80
best cost: $49.20 $49.00 $48.20 $48.40 $48.40 $49.40 $50.00
h mean: 0.42 0.41 0.40 0.40 0.38 0.39 0.39
h sdev: 5.47 5.42 5.39 5.35 5.22 5.30 5.22
s mean: 98.44 98.45 98.45 98.46 98.46 98.48 98.48
s sdev: 9.87 9.79 9.76 9.72 9.75 9.71 9.69
mean diff: 98.02 98.04 98.05 98.06 98.08 98.09 98.09
k: 6.39 6.45 6.47 6.51 6.55 6.53 6.58
It doesn't look like there's any significance in there, even with
the extreme sizes...
- Alex