[Spambayes] CL2 test part II

Brad Clements bkc@murkworks.com
Sun, 06 Oct 2002 12:34:07 -0400


In my earlier CL2 and CL3 tests, I trained on the 2nd half of my corpus, and tested the 
first half.

Now, I'm training on the first half and testing the 2nd half. 

First run of CL2 uncovered more misclassifications (which probably affected the 
training of my first test).

I'm temporarily "borrowing" a client's dual Xeon machine, still only using one processor 
of course, but it seems a lot faster than my PIII-933

In any case, here's CL2 results training first, testing second half.

> <stat> Ham scores for all runs: 6500 items; mean 0.94; sdev 7.21
-> <stat> min 0; median 0; max 100
* = 105 items
  0 6384 *************************************************************
 25   87 *
 50   21 *
 75    8 *

-> <stat> Spam scores for all runs: 6500 items; mean 99.32; sdev 5.94
-> <stat> min 0; median 100; max 100
* = 106 items
  0    3 *
 25   15 *
 50   68 *
 75 6414 *************************************************************
-> best cutoff for all runs: 0.5
->     with weighted total 1*29 fp + 18 fn = 47
->     fp rate 0.446%  fn rate 0.277%

Tokenizer]
mine_received_headers: True

[Classifier]
use_central_limit2 = True
use_central_limit3 = False
zscore_ratio_cutoff: 1.9

[TestDriver]
spam_cutoff: 0.50
show_false_negatives: True
nbuckets: 4

show_spam_lo: 0.0
show_spam_hi: 0.45

save_trained_pickles: True
save_histogram_pickles: True




Brad Clements,                bkc@murkworks.com   (315)268-1000
http://www.murkworks.com                          (315)268-9812 Fax
AOL-IM: BKClements