[Spambayes] Something to test

Anthony Baxter anthony@interlink.com.au
Mon Nov 4 06:27:47 2002


>>> Tim Peters wrote
> This little patch arranges to create "noheader:HEADERNAME" tokens for
> headers in options.safe_headers that *don't* appear in a msg's headers.  On
> my fat c.l.py test it's a small theoretical improvement:  best-cost falls
> from $26.80 to $22.00, by knocking down the score of the second-worst
> hopeless FP just enough so that redeeming it *could* be traded away for an
> increase in the Unsure rate.  That's not realistic, though (the spam_cutoff
> value needed to redeem that FP is no longer insane, but is still
> *unreasonably* high).
> 

filename:    before  after
ham:spam:  11192:1826     
                   11192:1826
fp total:        0       1  
fp %:         0.00    0.01  
fn total:        7       8 
fn %:         0.38    0.44 
unsure t:      106     107 
unsure %:     0.81    0.82 
real cost:  $28.20  $39.40 
best cost:  $28.20  $30.40 
h mean:       0.63    0.42 
h sdev:       4.19    4.19 
s mean:      98.68   98.63 
s sdev:       7.74    7.95 
mean diff:   98.05   98.21 
k:            8.22    8.09 

The additional fp was a mail-out from Nettwerk (that I've signed up
for, but which are _incredibly_ spammy) that went from 0.956 to 0.964,
where my spam cutoff is 0.96. The noheader: errors-to was the killer
clue that pushed it over the edge. The spam situation is considerably
worse. The additional false negative was something that went from 0.467 
to 0.431 (ham_cutoff 0.45). The damage came from
  prob('noheader:mime-version') = 0.245329
(It was a very short spam)

One fn went from 0.27 to 0.029, due to:
  prob('noheader:subject') = 0.0042591
  prob('noheader:to') = 0.0652536
  prob('noheader:mime-version') = 0.245329

It made pretty much all of my fn's at least slightly worse, if not
much worse.


For what it's worth the "Iron Citadel" comp.lang.python spam is 
currently showing up as a 0.0057 ham, prob('*H*')=1, prob('*S*')=0.0115174
This is far and away the worst spam I've seen for some time.

-- 
Anthony Baxter     <anthony@interlink.com.au>   
It's never too late to have a happy childhood.