[Spambayes] Seeking a giant idle machine w/ a miserable corpus

T. Alexander Popiel popiel@wolfskeep.com
Mon Nov 18 02:24:57 2002


In message:  <LNBBLJKPBEHFEDALKOLCAEDKCMAB.tim.one@comcast.net>
             Tim Peters <tim.one@comcast.net> writes:
>
>[Tim]
>> ...
>> The "missing test" here is exact bigrams (no hash convolutions).  I'll
>> try that later; may not have enough RAM for that, but should.

I haven't been able to do a big run of this, but here's my
results:

filename:      org  orgbix
ham:spam:  1000:1000      
                   1000:1000
fp total:        3       2
fp %:         0.30    0.20
fn total:       10       7
fn %:         1.00    0.70
unsure t:       27      28
unsure %:     1.35    1.40
real cost:  $45.40  $32.60
best cost:  $24.00  $24.20
h mean:       0.43    0.50
h sdev:       5.64    5.95
s mean:      97.94   98.28
s sdev:      11.59   10.45
mean diff:   97.51   97.78
k:            5.66    5.96

This is from a five-fold cross validation run.  Looks very nice.

- Alex



More information about the Spambayes mailing list