[Spambayes-checkins] spambayes/scripts sb_filter.py, 1.6,
1.7 sb_mboxtrain.py, 1.6, 1.7
Neale Pickett
npickett at users.sourceforge.net
Mon Nov 17 16:47:49 EST 2003
Update of /cvsroot/spambayes/spambayes/scripts
In directory sc8-pr-cvs1:/tmp/cvs-serv15297/scripts
Modified Files:
sb_filter.py sb_mboxtrain.py
Log Message:
* s/hammie/sb_filter/
* Contrib file cleanup (more bomb-proof)
* sb_filter options cleanup
* sb_mboxtrain bombproofing (won't try to write out messages if it can't
parse them)
Index: sb_filter.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/scripts/sb_filter.py,v
retrieving revision 1.6
retrieving revision 1.7
diff -C2 -d -r1.6 -r1.7
*** sb_filter.py 12 Nov 2003 22:02:55 -0000 1.6
--- sb_filter.py 17 Nov 2003 21:47:47 -0000 1.7
***************
*** 28,40 ****
-n
create a new database
! *+ -f
filter (default if no processing options are given)
! *+ -t
! [EXPERIMENTAL] filter and train based on the result (you must
! make sure to untrain all mistakes later)
! *+ -g
[EXPERIMENTAL] (re)train as a good (ham) message
! *+ -s
[EXPERIMENTAL] (re)train as a bad (spam) message
* -G
[EXPERIMENTAL] untrain ham (only use if you've already trained
--- 28,40 ----
-n
create a new database
! * -f
filter (default if no processing options are given)
! * -g
[EXPERIMENTAL] (re)train as a good (ham) message
! * -s
[EXPERIMENTAL] (re)train as a bad (spam) message
+ * -t
+ [EXPERIMENTAL] filter and train based on the result -- you must
+ make sure to untrain all mistakes later. Not recommended.
* -G
[EXPERIMENTAL] untrain ham (only use if you've already trained
***************
*** 47,52 ****
set [section, option] in the options database to value
! All options marked with '*' operate on stdin. Only those processing options
! marked with '+' send a modified message to stdout.
If no filenames are given on the command line, standard input will be
--- 47,52 ----
set [section, option] in the options database to value
! All options marked with '*' operate on stdin, and write the resultant
! message to stdout.
If no filenames are given on the command line, standard input will be
Index: sb_mboxtrain.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/scripts/sb_mboxtrain.py,v
retrieving revision 1.6
retrieving revision 1.7
diff -C2 -d -r1.6 -r1.7
*** sb_mboxtrain.py 26 Sep 2003 18:17:44 -0000 1.6
--- sb_mboxtrain.py 17 Nov 2003 21:47:47 -0000 1.7
***************
*** 33,37 ****
-q
quiet mode; no output
!
-n train mail residing in "new" directory, in addition to "cur"
directory, which is always trained (Maildir only)
--- 33,37 ----
-q
quiet mode; no output
!
-n train mail residing in "new" directory, in addition to "cur"
directory, which is always trained (Maildir only)
***************
*** 46,51 ****
True, False = 1, 0
! import sys, os, getopt
! from spambayes import hammie, mboxutils
from spambayes.Options import options
--- 46,51 ----
True, False = 1, 0
! import sys, os, getopt, email
! from spambayes import hammie
from spambayes.Options import options
***************
*** 53,56 ****
--- 53,76 ----
loud = True
+ def get_message(obj):
+ """Return an email Message object.
+
+ This works like mboxutis.get_message, except it doesn't junk the
+ headers if there's an error. Doing so would cause a headerless
+ message to be written back out!
+
+ """
+
+ if isinstance(obj, email.Message.Message):
+ return obj
+ # Create an email Message object.
+ if hasattr(obj, "read"):
+ obj = obj.read()
+ try:
+ msg = email.message_from_string(obj)
+ except email.Errors.MessageParseError:
+ msg = None
+ return msg
+
def msg_train(h, msg, is_spam, force):
"""Train bayes with a single message."""
***************
*** 110,115 ****
sys.stdout.flush()
f = file(cfn, "rb")
! msg = mboxutils.get_message(f)
f.close()
if not msg_train(h, msg, is_spam, force):
continue
--- 130,138 ----
sys.stdout.flush()
f = file(cfn, "rb")
! msg = get_message(f)
f.close()
+ if not msg:
+ print "Malformed message: %s. Skipping..." % cfn
+ continue
if not msg_train(h, msg, is_spam, force):
continue
***************
*** 143,147 ****
f = file(path, "r+b")
fcntl.flock(f, fcntl.LOCK_EX)
! mbox = mailbox.PortableUnixMailbox(f, mboxutils.get_message)
outf = os.tmpfile()
--- 166,170 ----
f = file(path, "r+b")
fcntl.flock(f, fcntl.LOCK_EX)
! mbox = mailbox.PortableUnixMailbox(f, get_message)
outf = os.tmpfile()
***************
*** 150,153 ****
--- 173,179 ----
for msg in mbox:
+ if not msg:
+ print "Malformed message number %d. I can't train on this mbox, sorry." % counter
+ return
counter += 1
if loud and counter % 10 == 0:
***************
*** 204,209 ****
sys.stdout.flush()
f = file(fn, "rb")
! msg = mboxutils.get_message(f)
f.close()
msg_train(h, msg, is_spam, force)
trained += 1
--- 230,238 ----
sys.stdout.flush()
f = file(fn, "rb")
! msg = get_message(f)
f.close()
+ if not msg:
+ print "Malformed message: %s. Skipping..." % cfn
+ continue
msg_train(h, msg, is_spam, force)
trained += 1
More information about the Spambayes-checkins
mailing list