[Spambayes] Chi True results

Brad Clements bkc@murkworks.com
Sat, 12 Oct 2002 15:07:50 -0400


I ran this twice, first to get the recommended spam cutoff, the 2nd time with the 
recommended cutoff in the .ini

then I compared it against the tim_combine_true test I ran previously.

In this message:  .ini, cmp.py results, histograms from chi true run.

[Tokenizer]
mine_received_headers: True

[Classifier]
use_central_limit = False
use_central_limit2 = False
use_central_limit3 = False
use_tim_combining: False
use_chi_squared_combining: True

[TestDriver]
spam_cutoff: 0.98
show_false_negatives: True
show_false_positives: True
nbuckets: 200
best_cutoff_fp_weight: 10

show_spam_lo: 0.4
show_spam_hi: 0.80
show_ham_lo = 0.40
show_ham_hi = 0.80
show_charlimit: 10000

save_trained_pickles: True
save_histogram_pickles: True



results/timcombinetrues.txt -> results/chitrues.txt
-> <stat> tested 1300 hams & 1300 spams against 11700 hams & 11700 spams

false positive percentages
    1.077  0.154  won    -85.70%
    0.769  0.231  won    -69.96%
    0.769  0.077  won    -89.99%
    0.923  0.154  won    -83.32%
    0.769  0.154  won    -79.97%
    0.538  0.077  won    -85.69%
    0.538  0.077  won    -85.69%
    0.692  0.000  won   -100.00%
    0.769  0.231  won    -69.96%
    0.692  0.000  won   -100.00%

won  10 times
tied  0 times
lost  0 times

total unique fp went from 98 to 15 won    -84.69%
mean fp % went from 0.753846153846 to 0.115384615385 won    -84.69%

false negative percentages
    0.154  0.846  lost  +449.35%
    0.154  1.231  lost  +699.35%
    0.231  1.154  lost  +399.57%
    0.077  0.615  lost  +698.70%
    0.000  0.923  lost  +(was 0)
    0.231  1.308  lost  +466.23%
    0.231  0.692  lost  +199.57%
    0.077  1.077  lost  +1298.70%
    0.154  1.231  lost  +699.35%
    0.231  1.231  lost  +432.90%

won   0 times
tied  0 times
lost 10 times

total unique fn went from 20 to 134 lost  +570.00%
mean fn % went from 0.153846153846 to 1.03076923077 lost  +570.00%

ham mean                     ham sdev
  12.23    1.40  -88.55%        9.02    8.67   -3.88%
  12.04    1.12  -90.70%        8.57    8.09   -5.60%
  12.08    1.12  -90.73%        8.44    8.02   -4.98%
  12.21    1.26  -89.68%        8.65    8.62   -0.35%
  11.98    1.06  -91.15%        8.40    8.03   -4.40%
  12.20    1.01  -91.72%        8.16    6.87  -15.81%
  11.69    0.85  -92.73%        7.80    6.57  -15.77%
  11.61    0.96  -91.73%        7.91    7.06  -10.75%
  11.63    1.15  -90.11%        8.31    8.38   +0.84%
  11.60    1.01  -91.29%        7.94    7.62   -4.03%

ham mean and sdev for all runs
  11.93    1.09  -90.86%        8.33    7.83   -6.00%

spam mean                    spam sdev
  90.31   99.74  +10.44%        7.59    3.59  -52.70%
  90.59   99.67  +10.02%        7.68    4.17  -45.70%
  90.72   99.68   +9.88%        7.40    4.12  -44.32%
  90.91   99.83   +9.81%        7.16    2.68  -62.57%
  90.54   99.84  +10.27%        6.93    2.20  -68.25%
  90.68   99.66   +9.90%        7.23    4.29  -40.66%
  90.49   99.67  +10.14%        7.25    4.68  -35.45%
  90.61   99.79  +10.13%        7.29    2.98  -59.12%
  90.93   99.75   +9.70%        7.21    3.24  -55.06%
  90.40   99.54  +10.11%        7.80    5.07  -35.00%

spam mean and sdev for all runs
  90.62   99.72  +10.04%        7.36    3.80  -48.37%

ham/spam mean difference: 78.69 98.63 +19.94


--

histogram from chi: true

-> <stat> Ham scores for all runs: 13000 items; mean 1.09; sdev 7.83
-> <stat> min -2.66454e-13; median 2.85882e-12; max 100
* = 204 items
 0.0 12433 *************************************************************
 0.5    71 *
 1.0    43 *
 1.5    33 *
 2.0    14 *
 2.5    15 *
 3.0    12 *
 3.5     5 *
 4.0    14 *
 4.5    11 *
 5.0     6 *
 5.5     9 *
 6.0     9 *
 6.5     5 *
 7.0     6 *
 7.5     3 *
 8.0     7 *
 8.5     2 *
 9.0     5 *
 9.5     5 *
10.0     5 *
10.5     5 *
11.0     3 *
11.5     4 *
12.0     7 *
12.5     2 *
13.0     3 *
13.5     2 *
14.0     3 *
14.5     4 *
15.0     3 *
15.5     3 *
16.0     0 
16.5     3 *
17.0     2 *
17.5     1 *
18.0     0 
18.5     5 *
19.0     3 *
19.5     1 *
20.0     1 *
20.5     3 *
21.0     0 
21.5     1 *
22.0     1 *
22.5     2 *
23.0     1 *
23.5     2 *
24.0     2 *
24.5     0 
25.0     0 
25.5     3 *
26.0     2 *
26.5     2 *
27.0     1 *
27.5     1 *
28.0     2 *
28.5     3 *
29.0     2 *
29.5     2 *
30.0     1 *
30.5     3 *
31.0     1 *
31.5     1 *
32.0     4 *
32.5     2 *
33.0     2 *
33.5     3 *
34.0     1 *
34.5     3 *
35.0     1 *
35.5     3 *
36.0     5 *
36.5     4 *
37.0     0 
37.5     3 *
38.0     1 *
38.5     1 *
39.0     0 
39.5     2 *
40.0     2 *
40.5     3 *
41.0     2 *
41.5     1 *
42.0     1 *
42.5     3 *
43.0     2 *
43.5     1 *
44.0     2 *
44.5     3 *
45.0     3 *
45.5     5 *
46.0     1 *
46.5     3 *
47.0     1 *
47.5     5 *
48.0     1 *
48.5     3 *
49.0     9 *
49.5    11 *
50.0     8 *
50.5     1 *
51.0     3 *
51.5     1 *
52.0     7 *
52.5     3 *
53.0     2 *
53.5     1 *
54.0     0 
54.5     1 *
55.0     2 *
55.5     0 
56.0     3 *
56.5     0 
57.0     0 
57.5     1 *
58.0     2 *
58.5     0 
59.0     0 
59.5     1 *
60.0     1 *
60.5     1 *
61.0     0 
61.5     0 
62.0     0 
62.5     0 
63.0     2 *
63.5     0 
64.0     0 
64.5     0 
65.0     0 
65.5     1 *
66.0     0 
66.5     0 
67.0     0 
67.5     0 
68.0     0 
68.5     2 *
69.0     0 
69.5     1 *
70.0     1 *
70.5     0 
71.0     0 
71.5     1 *
72.0     0 
72.5     1 *
73.0     0 
73.5     0 
74.0     1 *
74.5     0 
75.0     0 
75.5     0 
76.0     2 *
76.5     0 
77.0     0 
77.5     0 
78.0     0 
78.5     1 *
79.0     0 
79.5     0 
80.0     1 *
80.5     1 *
81.0     1 *
81.5     0 
82.0     2 *
82.5     0 
83.0     0 
83.5     1 *
84.0     0 
84.5     3 *
85.0     0 
85.5     1 *
86.0     1 *
86.5     0 
87.0     1 *
87.5     1 *
88.0     2 *
88.5     1 *
89.0     0 
89.5     0 
90.0     2 * 
90.5     0 
91.0     0 
91.5     1 *   
92.0     0 
92.5     1 *
93.0     1 *
93.5     0 
94.0     2 *
94.5     1 *
95.0     1 *
95.5     2 *
96.0     1 *
96.5     2 *
97.0     1 *
97.5     2 *
98.0     0 
98.5     0 
99.0     3 *
99.5    12 *  thanks for joining paypal,  ETrade news, HP Symposiom, Registration ack from Cingular, 
      EDN renewal, X10 newsletter (argh!), FAFSA US Dept Education renewal :-(, 
United Connection, Network Computing Renewal, Infotel Distributing

-> <stat> Spam scores for all runs: 13000 items; mean 99.72; sdev 3.80

This histogram seems broken, I have 4 or 5 spams with prob < .0.05

> Survey on Software Reuse Views and Activity

> You are invited to participate in my Dissertation research on the topic of ^M
> Software Reuse.

(naw)

VoIP solutions for providers

HP Enterprise Technical Symposium (oops, this should be ham, guess I got sick of 
getting these)

-> <stat> min 0.000127988; median 100; max 100
* = 210 items
 0.0     1 * ***New SAP Opportunities*** Client interviewing now!!
 0.5     1 * Certified IT professional with over 6 years of Experience on Design
        and Coding.
 1.0     0 
 1.5     1 * Senior Consultant with Experience on JD Edwards, ONE WORLD, XE, CNC,
        AS/400 is available
 2.0     0 
 2.5     1 * Fax / Copier Sales / service call 2078787
 3.0     1 * Development Services on Telecom/Datacom Protocols
 3.5     0 
 4.0     0 
 4.5     0 
 5.0     0 
 5.5     0 
 6.0     0 
 6.5     0 
 7.0     1 * Certified IT professional with over 6 years of Experience on Design
        and Coding.
 7.5     0 
 8.0     0 
 8.5     1 *
 9.0     0 
 9.5     0 
10.0     0 
10.5     0 
11.0     0 
11.5     0 
12.0     0 
12.5     0 
13.0     0 
13.5     0 
14.0     0 
14.5     0 
15.0     0 
15.5     0 
16.0     1 * Use the Session Scheduler to personalize your training (hp, probably mis-classified, guess I did get sick of them)
16.5     1 * VoIP solutions for providers
17.0     0 
17.5     0 
18.0     0 
18.5     0 
19.0     0 
19.5     0 
20.0     0 
20.5     1 *
21.0     0 
21.5     0 
22.0     2 *
22.5     0 
23.0     0 
23.5     0 
24.0     0 
24.5     1 *
25.0     0 
25.5     0 
26.0     0 
26.5     0 
27.0     0 
27.5     0 
28.0     0 
28.5     0 
29.0     0 
29.5     0 
30.0     0 
30.5     0 
31.0     1 *
31.5     0 
32.0     0 
32.5     0 
33.0     0 
33.5     0 
34.0     0 
34.5     0 
35.0     0 
35.5     0 
36.0     0 
36.5     0 
37.0     0 
37.5     0 
38.0     0 
38.5     1 *
39.0     0 
39.5     0 
40.0     0 
40.5     0 
41.0     0 
41.5     0 
42.0     0 
42.5     0 
43.0     0 
43.5     0 
44.0     1 *
44.5     2 *
45.0     0 
45.5     0 
46.0     0 
46.5     0 
47.0     0 
47.5     0 
48.0     0 
48.5     1 *
49.0     0 
49.5     1 *
50.0     9 *
50.5     0 
51.0     2 *
51.5     0 
52.0     1 *
52.5     0 
53.0     1 *
53.5     0 
54.0     0 
54.5     0 
55.0     0 
55.5     2 *
56.0     1 *
56.5     0 
57.0     1 *
57.5     0 
58.0     0 
58.5     0 
59.0     0 
59.5     0 
60.0     0 
60.5     0 
61.0     0 
61.5     0 
62.0     0 
62.5     2 *
63.0     0 
63.5     0 
64.0     0 
64.5     2 *
65.0     0 
65.5     1 *
66.0     1 *
66.5     0 
67.0     0 
67.5     0 
68.0     0 
68.5     1 *
69.0     0 
69.5     0 
70.0     0 
70.5     0 
71.0     0 
71.5     0 
72.0     0 
72.5     1 *
73.0     0 
73.5     1 *
74.0     0 
74.5     0 
75.0     0 
75.5     0 
76.0     5 *
76.5     0 
77.0     1 *
77.5     2 *
78.0     2 *
78.5     1 *
79.0     2 *
79.5     2 *
80.0     1 *
80.5     0 
81.0     1 *
81.5     0 
82.0     1 *
82.5     1 *
83.0     2 *
83.5     1 *
84.0     3 *
84.5     0 
85.0     1 *
85.5     1 *
86.0     2 *
86.5     1 *
87.0     0 
87.5     0 
88.0     2 *
88.5     0 
89.0     1 *
89.5     1 *
90.0     2 *
90.5     5 *
91.0     0 
91.5     4 *
92.0     3 *
92.5     2 *
93.0     1 *
93.5     3 *
94.0     2 *
94.5     5 *
95.0     3 *
95.5     4 *
96.0     5 *
96.5     6 *
97.0     5 *
97.5     4 *
98.0    10 *
98.5    16 *
99.0    33 *
99.5 12807 *************************************************************
-> best cutoff for all runs: 0.98
->     with weighted total 10*15 fp + 134 fn = 284
->     fp rate 0.115%  fn rate 1.03%
    saving ham histogram pickle to class_hamhist.pik
    saving spam histogram pickle to class_spamhist.pik



Brad Clements,                bkc@murkworks.com   (315)268-1000
http://www.murkworks.com                          (315)268-9812 Fax
AOL-IM: BKClements