[Spambayes-checkins] spambayes timtest.py,1.1,1.2

Tim Peters tim_one@users.sourceforge.net
Thu, 05 Sep 2002 16:51:34 -0700


Update of /cvsroot/spambayes/spambayes
In directory usw-pr-cvs1:/tmp/cvs-serv13855

Modified Files:
	timtest.py 
Log Message:
Pure win for the f-n rate:  take X-Mailer into account.

false positive percentages
    0.000  0.000  tied
    0.000  0.000  tied
    0.050  0.075  lost
    0.000  0.000  tied
    0.025  0.025  tied
    0.025  0.025  tied
    0.050  0.050  tied
    0.025  0.025  tied
    0.025  0.025  tied
    0.050  0.050  tied
    0.075  0.075  tied
    0.025  0.025  tied
    0.025  0.025  tied
    0.025  0.025  tied
    0.025  0.025  tied
    0.025  0.025  tied
    0.025  0.025  tied
    0.000  0.000  tied
    0.025  0.025  tied
    0.050  0.050  tied

won   0 times
tied 19 times
lost  1 times

total unique fp went from 8 to 8

false negative percentages
    0.691  0.582  won
    0.655  0.618  won
    0.945  0.836  won
    1.309  1.236  won
    1.164  1.018  won
    0.800  0.764  won
    0.763  0.691  won
    1.163  1.054  won
    1.345  1.236  won
    1.127  1.018  won
    1.345  1.236  won
    1.490  1.418  won
    0.909  0.764  won
    0.582  0.473  won
    0.691  0.509  won
    1.163  0.945  won
    1.018  0.945  won
    0.873  0.727  won
    0.909  0.764  won
    1.127  0.981  won

won  20 times
tied  0 times
lost  0 times

total unique fn went from 249 to 226


Index: timtest.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/timtest.py,v
retrieving revision 1.1
retrieving revision 1.2
diff -C2 -d -r1.1 -r1.2
*** timtest.py	5 Sep 2002 16:16:43 -0000	1.1
--- timtest.py	5 Sep 2002 23:51:32 -0000	1.2
***************
*** 508,513 ****
      # From:
      # Reply-To:
!     # X-Mailer:
!     for field in ('from',):# 'reply-to', 'x-mailer',):
          prefix = field + ':'
          subj = msg.get(field, '-None-')
--- 508,512 ----
      # From:
      # Reply-To:
!     for field in ('from',):# 'reply-to',):
          prefix = field + ':'
          subj = msg.get(field, '-None-')
***************
*** 515,518 ****
--- 514,526 ----
              for t in tokenize_word(w):
                  yield prefix + t
+ 
+     # These headers seem to work best if they're not tokenized:  just
+     # normalize case and whitespace.
+     # X-Mailer:  This is a pure and significant win for the f-n rate; f-p
+     #            rate isn't affected.
+     for field in ('x-mailer',):
+         prefix = field + ':'
+         subj = msg.get(field, '-None-')
+         yield prefix + ' '.join(subj.lower().split())
  
      # Organization: