[spambayes-dev] untested idea for calculating message lengths
Matthew Dixon Cowles
matt at mondoinfo.com
Mon Jul 26 22:44:45 CEST 2004
> One of the problems Spambayes seems to still have on occasion is
> properly classifying very short messages. I came up with the
> attached simple way to compute a message's effective length, but
> have yet to test it.
It gives me a tiny improvement over all defaults:
-> <stat> tested 200 hams & 200 spams against 800 hams & 800 spams
-> <stat> tested 200 hams & 200 spams against 800 hams & 800 spams
-> <stat> tested 200 hams & 200 spams against 800 hams & 800 spams
-> <stat> tested 200 hams & 200 spams against 800 hams & 800 spams
-> <stat> tested 200 hams & 200 spams against 800 hams & 800 spams
-> <stat> tested 200 hams & 200 spams against 800 hams & 800 spams
-> <stat> tested 200 hams & 200 spams against 800 hams & 800 spams
-> <stat> tested 200 hams & 200 spams against 800 hams & 800 spams
-> <stat> tested 200 hams & 200 spams against 800 hams & 800 spams
-> <stat> tested 200 hams & 200 spams against 800 hams & 800 spams
filename: normal skip
ham:spam: 1000:1000 1000:1000
fp total: 1 1
fp %: 0.10 0.10
fn total: 5 5
fn %: 0.50 0.50
unsure t: 60 58
unsure %: 3.00 2.90
real cost: $27.00 $26.60
best cost: $16.60 $16.40
h mean: 0.32 0.32
h sdev: 3.78 3.80
s mean: 97.33 97.37
s sdev: 11.14 11.12
mean diff: 97.01 97.05
k: 6.50 6.50
But once I turn on mine_received_headers, the improvement is lost:
-> <stat> tested 200 hams & 200 spams against 800 hams & 800 spams
-> <stat> tested 200 hams & 200 spams against 800 hams & 800 spams
-> <stat> tested 200 hams & 200 spams against 800 hams & 800 spams
-> <stat> tested 200 hams & 200 spams against 800 hams & 800 spams
-> <stat> tested 200 hams & 200 spams against 800 hams & 800 spams
-> <stat> tested 200 hams & 200 spams against 800 hams & 800 spams
-> <stat> tested 200 hams & 200 spams against 800 hams & 800 spams
-> <stat> tested 200 hams & 200 spams against 800 hams & 800 spams
-> <stat> tested 200 hams & 200 spams against 800 hams & 800 spams
-> <stat> tested 200 hams & 200 spams against 800 hams & 800 spams
filename: mine-received
mine-received-skip
ham:spam: 1000:1000 1000:1000
fp total: 0 0
fp %: 0.00 0.00
fn total: 4 4
fn %: 0.40 0.40
unsure t: 52 52
unsure %: 2.60 2.60
real cost: $14.40 $14.40
best cost: $8.80 $9.20
h mean: 0.26 0.26
h sdev: 3.16 3.21
s mean: 98.15 98.15
s sdev: 9.02 9.04
mean diff: 97.89 97.89
k: 8.04 7.99
Regards,
Matt
More information about the spambayes-dev
mailing list