[Spambayes] Seeking a giant idle machine w/ a miserable corpus
T. Alexander Popiel
popiel@wolfskeep.com
Mon Nov 18 02:24:57 2002
In message: <LNBBLJKPBEHFEDALKOLCAEDKCMAB.tim.one@comcast.net>
Tim Peters <tim.one@comcast.net> writes:
>
>[Tim]
>> ...
>> The "missing test" here is exact bigrams (no hash convolutions). I'll
>> try that later; may not have enough RAM for that, but should.
I haven't been able to do a big run of this, but here's my
results:
filename: org orgbix
ham:spam: 1000:1000
1000:1000
fp total: 3 2
fp %: 0.30 0.20
fn total: 10 7
fn %: 1.00 0.70
unsure t: 27 28
unsure %: 1.35 1.40
real cost: $45.40 $32.60
best cost: $24.00 $24.20
h mean: 0.43 0.50
h sdev: 5.64 5.95
s mean: 97.94 98.28
s sdev: 11.59 10.45
mean diff: 97.51 97.78
k: 5.66 5.96
This is from a five-fold cross validation run. Looks very nice.
- Alex
More information about the Spambayes
mailing list