[spambayes-dev] RE: [Spambayes] spambayes-1.0a6 bug: sb_mboxtrain.py fails to mark mail data as X-Spambayes-Trained

Alan W. Irwin airwin at users.sourceforge.net
Mon Oct 13 11:45:30 EDT 2003


On 2003-10-13 17:22+1300 Tony Meyer wrote:

> > I have chosen a two-message mbox folder called libtool as an
> > example, but I get the same result with larger folders as well.
> [...]
> > and no extra mail header line referring to X-Spambayes-Trained
>
> I believe if you add
> """
>             if is_spam:
>                 spamtxt = options["Headers", "header_spam_string"]
>             else:
>                 spamtxt = options["Headers", "header_ham_string"]
>             msg.add_header(options["Headers", "trained_header_name"],
> spamtxt)
> """
> At line 160 of sb_mboxtrain.py, this will have the desired effect.
>
> Spambayes-dev people - is this a bug?  The maildir_train() function adds
> this header, but the mbox_train() function doesn't, although it looks like
> it is meant to.
>
> =Tony Meyer
>

Actually, that fix won't work in the 1.0a6 version of the code since
the add_header part is done by msg_train which is called by _both_
maildir_train and mbox_train.  However, in researching this further I found
the actual source of the problem, and here is a simple patch to fix it.

--- sb_mboxtrain.py_original	Fri Oct 10 19:55:06 2003
+++ sb_mboxtrain.py	Mon Oct 13 08:18:48 2003
@@ -157,11 +157,11 @@
             sys.stdout.flush()
         if msg_train(h, msg, is_spam, force):
             trained += 1
-        if not options["Headers", "include_trained"]:
+        if options["Headers", "include_trained"]:
             # Write it out with the Unix "From " line
             outf.write(msg.as_string(True))

-    if not options["Headers", "include_trained"]:
+    if options["Headers", "include_trained"]:
         outf.seek(0)
         try:
             os.ftruncate(f.fileno(), 0)

The problem is the sense of the include_trained flag is taken the wrong
way in the original code for mbox_train.  (The equivalent code in
maildir_train continues, ie skips the write part of the loop which is the
correct sense of the include_trained flag so no changes are needed in
that case.)

I have tested the new code for the same simple case:

irwin at starling> python2.3 /usr/local/bin/sb_mboxtrain.py -d $HOME/.spambayes/hammie.dbm -g /home/irwin/cdburn0/Mail/libtool
Training ham (/home/irwin/cdburn0/Mail/libtool):
  Reading as Unix mbox
  Trained 2 out of 2 messages
irwin at starling> python2.3 /usr/local/bin/sb_mboxtrain.py -d $HOME/.spambayes/hammie.dbm -g /home/irwin/cdburn0/Mail/libtool
Training ham (/home/irwin/cdburn0/Mail/libtool):
  Reading as Unix mbox
  Trained 0 out of 2 messages

Note, that on the second time around the number of trained messages is zero
as it should be with the revised code.  With the original code it is 2 in
error.  Also, inspection of the libtool mbox shows the header has
now been written (the original code did not change libtool in the slightest)
using the

X-Spambayes-Trained: ham

header line.

To the developers here:

Please accept the above patch as a fix for bug # 821808, and close the bug.

Tony, will you please make sure this gets posted to spambayes_dev?  I
have not subscribed to that list so it will presumably take moderator
approval.

Alan
__________________________
Alan W. Irwin
email: irwin at beluga.phys.uvic.ca
phone: 250-727-2902

Astronomical research affiliation with Department of Physics and Astronomy,
University of Victoria (astrowww.phys.uvic.ca).

Programming affiliations with the PLplot scientific plotting software
package (plplot.org), the Yorick front-end to PLplot (yplot.sf.net), the
Loads of Linux Links project (loll.sf.net), and the Linux Brochure Project
(lbproject.sf.net).
__________________________

Linux-powered Science
__________________________



More information about the spambayes-dev mailing list