[spambayes-dev] Very small change for composite word tokenizing.
Meyer, Tony
T.A.Meyer at massey.ac.nz
Tue Aug 5 14:05:02 EDT 2003
Ok, for those interested in testing this out, there are *two* changes to
make to the code that Sean posted. The first is to change the regex to
include '0', and the second is to yield w and not word. Sean made these
changes and said that his positives results disappeared, but mine
didn't:
[the third and fourth columns are the old, inaccurate results, included
for reference]
filename: august_no_seans august_no_seans
accurate_seans august_seans
ham:spam: 7900:15260 7900:15260
7900:15260 7900:15260
fp total: 2 2 2 2
fp %: 0.03 0.03 0.03 0.03
fn total: 176 175 176 172
fn %: 1.15 1.15 1.15 1.13
unsure t: 501 495 501 499
unsure %: 2.16 2.14 2.16 2.15
real cost: $296.20 $294.00 $296.20 $291.80
best cost: $489.60 $488.80 $489.60 $488.80
h mean: 0.63 0.60 0.63 0.62
h sdev: 4.84 4.75 4.84 4.81
s mean: 94.52 94.49 94.52 94.57
s sdev: 18.67 18.70 18.67 18.56
mean diff: 93.89 93.89 93.89 93.95
k: 3.99 4.00 3.99 4.02
So my fn didn't go down nearly as much, but my unsures went down more.
=Tony Meyer
More information about the spambayes-dev
mailing list