[Spambayes-checkins] spambayes/scripts sb_filter.py, 1.6, 1.7 sb_mboxtrain.py, 1.6, 1.7

Neale Pickett npickett at users.sourceforge.net
Mon Nov 17 16:47:49 EST 2003


Update of /cvsroot/spambayes/spambayes/scripts
In directory sc8-pr-cvs1:/tmp/cvs-serv15297/scripts

Modified Files:
	sb_filter.py sb_mboxtrain.py 
Log Message:
* s/hammie/sb_filter/
* Contrib file cleanup (more bomb-proof)
* sb_filter options cleanup
* sb_mboxtrain bombproofing (won't try to write out messages if it can't
  parse them)


Index: sb_filter.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/scripts/sb_filter.py,v
retrieving revision 1.6
retrieving revision 1.7
diff -C2 -d -r1.6 -r1.7
*** sb_filter.py	12 Nov 2003 22:02:55 -0000	1.6
--- sb_filter.py	17 Nov 2003 21:47:47 -0000	1.7
***************
*** 28,40 ****
      -n
          create a new database
! *+  -f
          filter (default if no processing options are given)
! *+  -t
!         [EXPERIMENTAL] filter and train based on the result (you must
!         make sure to untrain all mistakes later)
! *+  -g
          [EXPERIMENTAL] (re)train as a good (ham) message
! *+  -s
          [EXPERIMENTAL] (re)train as a bad (spam) message
  *   -G
          [EXPERIMENTAL] untrain ham (only use if you've already trained
--- 28,40 ----
      -n
          create a new database
! *   -f
          filter (default if no processing options are given)
! *   -g
          [EXPERIMENTAL] (re)train as a good (ham) message
! *   -s
          [EXPERIMENTAL] (re)train as a bad (spam) message
+ *   -t
+         [EXPERIMENTAL] filter and train based on the result -- you must
+         make sure to untrain all mistakes later.  Not recommended.
  *   -G
          [EXPERIMENTAL] untrain ham (only use if you've already trained
***************
*** 47,52 ****
          set [section, option] in the options database to value
  
! All options marked with '*' operate on stdin.  Only those processing options
! marked with '+' send a modified message to stdout.
  
  If no filenames are given on the command line, standard input will be
--- 47,52 ----
          set [section, option] in the options database to value
  
! All options marked with '*' operate on stdin, and write the resultant
! message to stdout.
  
  If no filenames are given on the command line, standard input will be

Index: sb_mboxtrain.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/scripts/sb_mboxtrain.py,v
retrieving revision 1.6
retrieving revision 1.7
diff -C2 -d -r1.6 -r1.7
*** sb_mboxtrain.py	26 Sep 2003 18:17:44 -0000	1.6
--- sb_mboxtrain.py	17 Nov 2003 21:47:47 -0000	1.7
***************
*** 33,37 ****
      -q
          quiet mode; no output
!         
      -n  train mail residing in "new" directory, in addition to "cur"
          directory, which is always trained (Maildir only)
--- 33,37 ----
      -q
          quiet mode; no output
! 
      -n  train mail residing in "new" directory, in addition to "cur"
          directory, which is always trained (Maildir only)
***************
*** 46,51 ****
      True, False = 1, 0
  
! import sys, os, getopt
! from spambayes import hammie, mboxutils
  from spambayes.Options import options
  
--- 46,51 ----
      True, False = 1, 0
  
! import sys, os, getopt, email
! from spambayes import hammie
  from spambayes.Options import options
  
***************
*** 53,56 ****
--- 53,76 ----
  loud = True
  
+ def get_message(obj):
+     """Return an email Message object.
+ 
+     This works like mboxutis.get_message, except it doesn't junk the
+     headers if there's an error.  Doing so would cause a headerless
+     message to be written back out!
+ 
+     """
+ 
+     if isinstance(obj, email.Message.Message):
+         return obj
+     # Create an email Message object.
+     if hasattr(obj, "read"):
+         obj = obj.read()
+     try:
+         msg = email.message_from_string(obj)
+     except email.Errors.MessageParseError:
+         msg = None
+     return msg
+ 
  def msg_train(h, msg, is_spam, force):
      """Train bayes with a single message."""
***************
*** 110,115 ****
              sys.stdout.flush()
          f = file(cfn, "rb")
!         msg = mboxutils.get_message(f)
          f.close()
          if not msg_train(h, msg, is_spam, force):
              continue
--- 130,138 ----
              sys.stdout.flush()
          f = file(cfn, "rb")
!         msg = get_message(f)
          f.close()
+         if not msg:
+             print "Malformed message: %s.  Skipping..." % cfn
+             continue
          if not msg_train(h, msg, is_spam, force):
              continue
***************
*** 143,147 ****
      f = file(path, "r+b")
      fcntl.flock(f, fcntl.LOCK_EX)
!     mbox = mailbox.PortableUnixMailbox(f, mboxutils.get_message)
  
      outf = os.tmpfile()
--- 166,170 ----
      f = file(path, "r+b")
      fcntl.flock(f, fcntl.LOCK_EX)
!     mbox = mailbox.PortableUnixMailbox(f, get_message)
  
      outf = os.tmpfile()
***************
*** 150,153 ****
--- 173,179 ----
  
      for msg in mbox:
+         if not msg:
+             print "Malformed message number %d.  I can't train on this mbox, sorry." % counter
+             return
          counter += 1
          if loud and counter % 10 == 0:
***************
*** 204,209 ****
              sys.stdout.flush()
          f = file(fn, "rb")
!         msg = mboxutils.get_message(f)
          f.close()
          msg_train(h, msg, is_spam, force)
          trained += 1
--- 230,238 ----
              sys.stdout.flush()
          f = file(fn, "rb")
!         msg = get_message(f)
          f.close()
+         if not msg:
+             print "Malformed message: %s.  Skipping..." % cfn
+             continue
          msg_train(h, msg, is_spam, force)
          trained += 1





More information about the Spambayes-checkins mailing list