[Spambayes-checkins] spambayes timtest.py,1.1,1.2
Tim Peters
tim_one@users.sourceforge.net
Thu, 05 Sep 2002 16:51:34 -0700
Update of /cvsroot/spambayes/spambayes
In directory usw-pr-cvs1:/tmp/cvs-serv13855
Modified Files:
timtest.py
Log Message:
Pure win for the f-n rate: take X-Mailer into account.
false positive percentages
0.000 0.000 tied
0.000 0.000 tied
0.050 0.075 lost
0.000 0.000 tied
0.025 0.025 tied
0.025 0.025 tied
0.050 0.050 tied
0.025 0.025 tied
0.025 0.025 tied
0.050 0.050 tied
0.075 0.075 tied
0.025 0.025 tied
0.025 0.025 tied
0.025 0.025 tied
0.025 0.025 tied
0.025 0.025 tied
0.025 0.025 tied
0.000 0.000 tied
0.025 0.025 tied
0.050 0.050 tied
won 0 times
tied 19 times
lost 1 times
total unique fp went from 8 to 8
false negative percentages
0.691 0.582 won
0.655 0.618 won
0.945 0.836 won
1.309 1.236 won
1.164 1.018 won
0.800 0.764 won
0.763 0.691 won
1.163 1.054 won
1.345 1.236 won
1.127 1.018 won
1.345 1.236 won
1.490 1.418 won
0.909 0.764 won
0.582 0.473 won
0.691 0.509 won
1.163 0.945 won
1.018 0.945 won
0.873 0.727 won
0.909 0.764 won
1.127 0.981 won
won 20 times
tied 0 times
lost 0 times
total unique fn went from 249 to 226
Index: timtest.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/timtest.py,v
retrieving revision 1.1
retrieving revision 1.2
diff -C2 -d -r1.1 -r1.2
*** timtest.py 5 Sep 2002 16:16:43 -0000 1.1
--- timtest.py 5 Sep 2002 23:51:32 -0000 1.2
***************
*** 508,513 ****
# From:
# Reply-To:
! # X-Mailer:
! for field in ('from',):# 'reply-to', 'x-mailer',):
prefix = field + ':'
subj = msg.get(field, '-None-')
--- 508,512 ----
# From:
# Reply-To:
! for field in ('from',):# 'reply-to',):
prefix = field + ':'
subj = msg.get(field, '-None-')
***************
*** 515,518 ****
--- 514,526 ----
for t in tokenize_word(w):
yield prefix + t
+
+ # These headers seem to work best if they're not tokenized: just
+ # normalize case and whitespace.
+ # X-Mailer: This is a pure and significant win for the f-n rate; f-p
+ # rate isn't affected.
+ for field in ('x-mailer',):
+ prefix = field + ':'
+ subj = msg.get(field, '-None-')
+ yield prefix + ' '.join(subj.lower().split())
# Organization: