[Spambayes] Two Scheme Enter, One Scheme Leave.
Anthony Baxter
anthony@interlink.com.au
Thu, 26 Sep 2002 01:18:11 +1000
Part the second...
A brief sidetrip into fiddling robinson_probability_x showed that
setting it to 0.4 and 0.6 (instead of the default 0.5) had no real
affect on fp/fn numbers, but resulted in average ham and spam numbers
being around 1% lower and higher, respectively.
min_prob_strength is next. Carrying over best so far,
(cutoff=0.6, a=0.1, x=0.5)
fp fn fp+fn
0.00 7 50 57
0.05 8 23 31
0.08 9 21 30
0.09 9 21 30
0.10 9 21 30
0.11 12 20 32
0.12 12 20 32
0.15 13 19 32
0.20 23 19 42
0.25 23 18 41
0.30 28 15 43
0.35 29 17 46
0.40 36 17 53
0.45 51 17 68
0.49 75 32 107
The "best" cutoff numbers for the different min_prob_strength settings:
fp fn fp+fn cutoff
0.00 13 24 37 0.575
0.05 8 23 31 0.6
0.08 9 21 30 0.6
0.09 9 21 30 0.6
0.10 9 21 30 0.6
0.11 12 20 32 0.6
0.12 12 20 32 0.6
0.15 13 19 32 0.6
0.20 13 26 39 0.625
0.25 23 18 41 0.6
0.30 28 15 43 0.6
0.35 23 21 44 0.625
0.40 27 18 45 0.625
0.45 40 27 67 0.625
0.49 77 26 103 0.575
That's it for tonight. If people (well, ok, Tim) want more detail,
let me know, and let me know what you want to see. All up, just the
test_foo_2s.txt summary files alone are about 4M of data (about 35
test runs). If the tram ride to work tomorrow is slow, I might write
something to run through all the data files and try to load it all
up into some sort of 4d array or something, see if it sees anything
interesting...
Tomorrow, I'll try the current "best settings"
(cutoff=0.6, a=0.1, x=0.5, min_prob_strength=0.09)
with a few different seeds, compared to Graham, and also try with
different spam/ham corpus sizes.
Anthony