[Spambayes] RE: spam detection via probability - actual results!
Sjoerd Mullender
sjoerd@acm.org
Fri, 20 Sep 2002 11:29:19 +0200
On Fri, Sep 20 2002 Tim Peters wrote:
> [Classifier]
> use_robinson_probability: True
> max_discriminators: 150
> hambias: 1.0
> [TestDriver]
> spam_cutoff: 0.50
Here are my results. I also have
[Tokenizer]
count_all_header_lines: True
mine_received_headers: True
in both runs.
run1s -> run2s
-> <stat> tested 100 hams & 100 spams against 700 hams & 700 spams
-> <stat> tested 100 hams & 100 spams against 700 hams & 700 spams
-> <stat> tested 100 hams & 100 spams against 700 hams & 700 spams
-> <stat> tested 100 hams & 100 spams against 700 hams & 700 spams
-> <stat> tested 100 hams & 100 spams against 700 hams & 700 spams
-> <stat> tested 100 hams & 100 spams against 700 hams & 700 spams
-> <stat> tested 100 hams & 100 spams against 700 hams & 700 spams
-> <stat> tested 100 hams & 100 spams against 700 hams & 700 spams
-> <stat> tested 100 hams & 100 spams against 700 hams & 700 spams
-> <stat> tested 100 hams & 100 spams against 700 hams & 700 spams
-> <stat> tested 100 hams & 100 spams against 700 hams & 700 spams
-> <stat> tested 100 hams & 100 spams against 700 hams & 700 spams
-> <stat> tested 100 hams & 100 spams against 700 hams & 700 spams
-> <stat> tested 100 hams & 100 spams against 700 hams & 700 spams
-> <stat> tested 100 hams & 100 spams against 700 hams & 700 spams
-> <stat> tested 100 hams & 100 spams against 700 hams & 700 spams
false positive percentages
0.000 0.000 tied
0.000 0.000 tied
0.000 0.000 tied
0.000 0.000 tied
0.000 1.000 lost +(was 0)
0.000 0.000 tied
0.000 0.000 tied
0.000 0.000 tied
won 0 times
tied 7 times
lost 1 times
total unique fp went from 0 to 1 lost +(was 0)
mean fp % went from 0.0 to 0.125 lost +(was 0)
false negative percentages
2.000 1.000 won -50.00%
3.000 2.000 won -33.33%
0.000 0.000 tied
1.000 1.000 tied
4.000 2.000 won -50.00%
0.000 0.000 tied
1.000 0.000 won -100.00%
1.000 1.000 tied
won 4 times
tied 4 times
lost 0 times
total unique fn went from 12 to 7 won -41.67%
mean fn % went from 1.5 to 0.875 won -41.67%
with histograms before:
Ham distribution for all runs:
* = 14 items
0.00 800 **********************************************************
2.50 0
[ deleted because all 0 ]
Spam distribution for all runs:
* = 14 items
0.00 11 *
2.50 0
[ deleted because all 0 ]
82.50 0
85.00 1 *
87.50 0
90.00 1 *
92.50 0
95.00 1 *
97.50 786 *********************************************************
and after:
Ham distribution for all runs:
* = 3 items
0.00 68 ***********************
2.50 16 ******
5.00 13 *****
7.50 20 *******
10.00 32 ***********
12.50 99 *********************************
15.00 85 *****************************
17.50 121 *****************************************
20.00 107 ************************************
22.50 81 ***************************
25.00 47 ****************
27.50 40 **************
30.00 18 ******
32.50 20 *******
35.00 15 *****
37.50 6 **
40.00 4 **
42.50 1 *
45.00 1 *
47.50 5 **
50.00 0
52.50 1 *
55.00 0
57.50 0
60.00 0
62.50 0
65.00 0
67.50 0
70.00 0
72.50 0
75.00 0
77.50 0
80.00 0
82.50 0
85.00 0
87.50 0
90.00 0
92.50 0
95.00 0
97.50 0
Spam distribution for all runs:
* = 2 items
0.00 0
2.50 0
5.00 0
7.50 0
10.00 0
12.50 0
15.00 0
17.50 0
20.00 0
22.50 0
25.00 0
27.50 0
30.00 0
32.50 0
35.00 0
37.50 1 *
40.00 1 *
42.50 0
45.00 2 *
47.50 3 **
50.00 5 ***
52.50 4 **
55.00 7 ****
57.50 9 *****
60.00 25 *************
62.50 44 **********************
65.00 57 *****************************
67.50 74 *************************************
70.00 69 ***********************************
72.50 78 ***************************************
75.00 52 **************************
77.50 59 ******************************
80.00 50 *************************
82.50 40 ********************
85.00 40 ********************
87.50 30 ***************
90.00 18 *********
92.50 17 *********
95.00 10 *****
97.50 105 *****************************************************
Here are the results if I keep hambias at 2.0:
false positive percentages
0.000 0.000 tied
0.000 0.000 tied
0.000 0.000 tied
0.000 0.000 tied
0.000 0.000 tied
0.000 0.000 tied
0.000 0.000 tied
0.000 0.000 tied
won 0 times
tied 8 times
lost 0 times
total unique fp went from 0 to 0 tied
mean fp % went from 0.0 to 0.0 tied
false negative percentages
2.000 3.000 lost +50.00%
3.000 4.000 lost +33.33%
0.000 0.000 tied
1.000 1.000 tied
4.000 5.000 lost +25.00%
0.000 0.000 tied
1.000 3.000 lost +200.00%
1.000 2.000 lost +100.00%
won 0 times
tied 3 times
lost 5 times
total unique fn went from 12 to 18 lost +50.00%
mean fn % went from 1.5 to 2.25 lost +50.00%
Only the after histograms:
Ham distribution for all runs:
* = 2 items
0.00 92 **********************************************
2.50 47 ************************
5.00 38 *******************
7.50 59 ******************************
10.00 93 ***********************************************
12.50 90 *********************************************
15.00 119 ************************************************************
17.50 108 ******************************************************
20.00 57 *****************************
22.50 32 ****************
25.00 24 ************
27.50 15 ********
30.00 11 ******
32.50 6 ***
35.00 3 **
37.50 2 *
40.00 2 *
42.50 2 *
45.00 0
47.50 0
50.00 0
52.50 0
55.00 0
57.50 0
60.00 0
62.50 0
65.00 0
67.50 0
70.00 0
72.50 0
75.00 0
77.50 0
80.00 0
82.50 0
85.00 0
87.50 0
90.00 0
92.50 0
95.00 0
97.50 0
Spam distribution for all runs:
* = 2 items
0.00 0
2.50 0
5.00 0
7.50 0
10.00 0
12.50 0
15.00 0
17.50 0
20.00 0
22.50 0
25.00 0
27.50 0
30.00 3 **
32.50 0
35.00 0
37.50 1 *
40.00 2 *
42.50 2 *
45.00 6 ***
47.50 4 **
50.00 13 *******
52.50 17 *********
55.00 29 ***************
57.50 45 ***********************
60.00 56 ****************************
62.50 76 **************************************
65.00 60 ******************************
67.50 62 *******************************
70.00 54 ***************************
72.50 59 ******************************
75.00 53 ***************************
77.50 37 *******************
80.00 44 **********************
82.50 20 **********
85.00 20 **********
87.50 13 *******
90.00 10 *****
92.50 8 ****
95.00 4 **
97.50 102 ***************************************************
-- Sjoerd Mullender <sjoerd@acm.org>