[Spambayes] More experiments with weaktest.py

Mon Nov 11 06:11:25 2002

In message:  <LNBBLJKPBEHFEDALKOLCMECDCJAB.tim.one@comcast.net>
             Tim Peters <tim.one@comcast.net> writes:
>
>I've been running weakloop.py over two sets of my c.l.py data while typing

I've now run weakloop.py over three sets of my private data;
that's 3*200 ham and 3*200 spam, for a total of 1200 messages.

The best few it came up with were:

Trained on 39 ham and 61 spam
fp: 4 fn: 3
Total cost: $61.60
Flex cost: $189.7713
x=0.5040 p=0.1040 s=0.4400 sc=0.902 hc=0.204 189.77

Trained on 38 ham and 61 spam
fp: 4 fn: 2
Total cost: $60.60
Flex cost: $189.9767
x=0.5060 p=0.1060 s=0.4300 sc=0.903 hc=0.206 189.98

Trained on 37 ham and 61 spam
fp: 4 fn: 2
Total cost: $60.40
Flex cost: $189.2842
x=0.5054 p=0.0980 s=0.4436 sc=0.905 hc=0.209 189.28

Trained on 37 ham and 61 spam
fp: 4 fn: 2
Total cost: $60.40
Flex cost: $189.8255
x=0.5033 p=0.0981 s=0.4456 sc=0.903 hc=0.206 189.83

Trained on 37 ham and 61 spam
fp: 4 fn: 2
Total cost: $60.40
Flex cost: $189.8260
x=0.5026 p=0.1000 s=0.4458 sc=0.902 hc=0.207 189.83

There were a few where it trained on a couple more or less ham and
spam... but I had to go hunting for them.  I find it quite interesting
that my ham:spam training ratio here (about 2:3, about where all my
ratio tests have been pointing as a sweet spot) is significantly
different than that reported by others (which has been much closer
to 1:1 or favoring more ham than spam).  I guess my corpus really
is unusual.

FWIW, I'm running it again with all 10 of my sets (4000 messages
total) overnight.

- Alex