[Spambayes-checkins] spambayes tokenizer.py,1.16,1.17
Anthony Baxter
anthony@interlink.com.au
Thu, 12 Sep 2002 17:13:20 +1000
>>> "Tim Peters" wrote
> Modified Files:
> tokenizer.py
> Log Message:
> Added code to strip uuencoded sections. As reported on the mailing list,
> this has no effect on my results, except that one spam in now judged as
> ham by all the other training sets. It shrinks the database size by a
> few percent, so that makes it a tiny win. If Anthony Baxter doesn't
> report better results on his data, I'll be sorely tempted to throw this
> out again.
I'd say nuke it:
anthony_tok1.16s -> anthony_tok1.17s
false positive percentages
0.778 0.778 tied
0.834 0.778 won -6.71%
0.890 0.890 tied
0.667 0.611 won -8.40%
1.112 1.112 tied
0.834 0.834 tied
0.723 0.723 tied
0.667 0.611 won -8.40%
1.167 1.167 tied
1.001 1.001 tied
0.779 0.779 tied
0.667 0.611 won -8.40%
0.778 0.778 tied
0.778 0.778 tied
0.556 0.556 tied
0.778 0.723 won -7.07%
0.611 0.611 tied
0.778 0.778 tied
0.723 0.723 tied
0.667 0.667 tied
won 5 times
tied 15 times
lost 0 times
total unique fp went from 143 to 141 won -1.40%
false negative percentages
0.646 0.646 tied
0.904 0.904 tied
0.517 0.581 lost +12.38%
1.229 1.229 tied
0.840 0.840 tied
1.033 1.033 tied
0.711 0.775 lost +9.00%
1.164 1.164 tied
0.646 0.646 tied
0.711 0.711 tied
0.646 0.711 lost +10.06%
0.517 0.517 tied
0.776 0.776 tied
0.646 0.646 tied
0.904 0.904 tied
1.035 1.035 tied
0.582 0.582 tied
0.581 0.581 tied
0.775 0.775 tied
0.646 0.646 tied
won 0 times
tied 17 times
lost 3 times