[spambayes-dev] Deprecated options

Tony Meyer ta-meyer at ihug.co.nz
Thu Aug 5 04:07:39 CEST 2004


[Tony]
> [Classifier] 
> x-experimental_ham_spam_imbalance_adjustment - the code for
> this is gone already; it's just the option that's left.
> [Tokenizer] x-extract_dow
> [Tokenizer] x-generate_time_buckets

[Skip]
> Definitely zap the above.

Done.

[Tony]
> [Classifier] x-use_bigrams - becomes a regular 
> option (defaulting to False?)

[Skip]
> False would be best.  We already have people complaining 
> about the size of their databases.

Unless anyone speaks up in the next couple of days, I'll remove the "x-"
from the option, the "EXPERIMENTAL" from the description, and leave it set
to False by default.

[Skip]
> Are the habeas headers a dead-end in the wider world that 
> most Spambayes users simply don't use?  If they are spoofed 
> they should be a fairly good spam clue.  I'm not sure I'd 
> delete them yet.

I'm not certain - I very rarely see mail with them (I have an Outlook thingy
that puts a little *H* next to mail with them, so I do notice when mail
does) - with the exception of one source (TidBITS/TidBITS-Talk).  For a
while I saw spam with them, too, but even that seems to have stopped.  I
wonder whether perhaps the experiment failed, and they simply don't get used
any more.

I'm happy to leave them for the moment - it would certainly be interesting
to see results from anyone that does get habeas-marked mail (good or bad).
It's a while since I did any testing with it, so I reran it with my current
testing corpora and got a loss and an indifferent:

(first line is all defaults, second is searching for habeas headers, third
is reducing habeas headers to a single token)

-> <stat> tested 280 hams & 131 spams against 1111 hams & 512 spams
[...]
filename:    exchanges exchange_habeass
                                   exchange_habeas_reduces
ham:spam:     1391:643    1391:643    1391:643
fp total:            0           0           0
fp %:             0.00        0.00        0.00
fn total:           35          35          35
fn %:             5.44        5.44        5.44
unsure t:           83          82          82
unsure %:         4.08        4.03        4.03
real cost:      $51.60      $51.40      $51.40
best cost:      $33.80      $33.20      $33.20
h mean:           0.10        0.09        0.09
h sdev:           1.72        1.60        1.60
s mean:          89.34       89.33       89.33
s sdev:          25.65       25.64       25.64
mean diff:       89.24       89.24       89.24
k:                3.26        3.28        3.28

-> <stat> tested 4690 hams & 384 spams against 18764 hams & 1539 spams
[...]
filename:        ihugs ihug_habeass
                                   ihug_habeas_reduces
ham:spam:   23454:1923  23454:1923  23454:1923
fp total:            1           5           5
fp %:             0.00        0.02        0.02
fn total:           23          20          20
fn %:             1.20        1.04        1.04
unsure t:          169         151         154
unsure %:         0.67        0.60        0.61
real cost:      $66.80     $100.20     $100.80
best cost:      $57.00      $84.20      $83.00
h mean:           0.09        0.12        0.12
h sdev:           1.89        2.36        2.38
s mean:          95.86       96.42       96.43
s sdev:          14.99       14.20       14.17
mean diff:       95.77       96.30       96.31
k:                5.67        5.82        5.82

=Tony Meyer



More information about the spambayes-dev mailing list