[spambayes-dev] untested idea for calculating message lengths

Matthew Dixon Cowles matt at mondoinfo.com
Mon Jul 26 22:44:45 CEST 2004


> One of the problems Spambayes seems to still have on occasion is
> properly classifying very short messages.  I came up with the
> attached simple way to compute a message's effective length, but
> have yet to test it.

It gives me a tiny improvement over all defaults:

-> <stat> tested 200 hams & 200 spams against 800 hams & 800 spams
-> <stat> tested 200 hams & 200 spams against 800 hams & 800 spams
-> <stat> tested 200 hams & 200 spams against 800 hams & 800 spams
-> <stat> tested 200 hams & 200 spams against 800 hams & 800 spams
-> <stat> tested 200 hams & 200 spams against 800 hams & 800 spams
-> <stat> tested 200 hams & 200 spams against 800 hams & 800 spams
-> <stat> tested 200 hams & 200 spams against 800 hams & 800 spams
-> <stat> tested 200 hams & 200 spams against 800 hams & 800 spams
-> <stat> tested 200 hams & 200 spams against 800 hams & 800 spams
-> <stat> tested 200 hams & 200 spams against 800 hams & 800 spams
filename:       normal        skip
ham:spam:    1000:1000   1000:1000
fp total:            1           1
fp %:             0.10        0.10
fn total:            5           5
fn %:             0.50        0.50
unsure t:           60          58
unsure %:         3.00        2.90
real cost:      $27.00      $26.60
best cost:      $16.60      $16.40
h mean:           0.32        0.32
h sdev:           3.78        3.80
s mean:          97.33       97.37
s sdev:          11.14       11.12
mean diff:       97.01       97.05
k:                6.50        6.50

But once I turn on mine_received_headers, the improvement is lost:

-> <stat> tested 200 hams & 200 spams against 800 hams & 800 spams
-> <stat> tested 200 hams & 200 spams against 800 hams & 800 spams
-> <stat> tested 200 hams & 200 spams against 800 hams & 800 spams
-> <stat> tested 200 hams & 200 spams against 800 hams & 800 spams
-> <stat> tested 200 hams & 200 spams against 800 hams & 800 spams
-> <stat> tested 200 hams & 200 spams against 800 hams & 800 spams
-> <stat> tested 200 hams & 200 spams against 800 hams & 800 spams
-> <stat> tested 200 hams & 200 spams against 800 hams & 800 spams
-> <stat> tested 200 hams & 200 spams against 800 hams & 800 spams
-> <stat> tested 200 hams & 200 spams against 800 hams & 800 spams
filename:  mine-received          
                       mine-received-skip
ham:spam:    1000:1000   1000:1000
fp total:            0           0
fp %:             0.00        0.00
fn total:            4           4
fn %:             0.40        0.40
unsure t:           52          52
unsure %:         2.60        2.60
real cost:      $14.40      $14.40
best cost:       $8.80       $9.20
h mean:           0.26        0.26
h sdev:           3.16        3.21
s mean:          98.15       98.15
s sdev:           9.02        9.04
mean diff:       97.89       97.89
k:                8.04        7.99

Regards,
Matt



More information about the spambayes-dev mailing list