[Spambayes] new python error in sbfilter.py

Skip Montanaro skip.montanaro at gmail.com
Fri Mar 10 18:02:16 EST 2017


I would avoid training on every message in your procmailrc file, and only
use the mitt macros to train on misses and unsures. I would only use a
procmail recipe to score incoming messages.

Skip

On Mar 9, 2017 9:35 PM, "Fred Smith" <fredex at fcshome.stoneham.ma.us> wrote:

> On Thu, Mar 09, 2017 at 01:57:27PM -0500, Fred Smith wrote:
> later....
> see below
> > On Thu, Mar 09, 2017 at 07:23:38AM -0600, Skip Montanaro wrote:
> > >    Fred,
> > >    It looks like your training database is corrupt. At the very end of
> the
> > >    long traceback, the message indicates that the count of messages
> (ham
> > >    or spam) in which a particular word appears is greater than the
> number
> > >    of messages in that particular category. I think you should be able
> to
> > >    just retrain from scratch on your existing database.
> > >    Skip
> >
> > Sigh.
> >
> > That worked. for a little while. then it started doing it again.
> >
> > I've recently started using these macros in mutt:
> >
> >       macro index S "|sb_filter.py -s -f | procmail\&\nd"
> >       macro pager S "|sb_filter.py -s -f | procmail\&\nd"
> >       macro index H "|sb_filter.py -g -f | procmail\&\nd"
> >       macro pager H "|sb_filter.py -g -f | procmail\&\nd"
> >
> > and in procmail there are these rules:
> >
> >       :0 fw:hamlock
> >       | /usr/bin/sb_filter.py -f -d $HOME/.hammiedb
>
> Ah HA! BINGO!
> that's the problem right there... the macros (above) train on the mail
> then hand it to procmail. Procmail trains it AGAIN, thereby doubling up
> every mail that gets trained that way in the database.
>
> Those macros are a really HANDY way to fix an incorrect training
> while putting it in the right folder. Is there a way anyone can think
> of that avoids the double training?
>
> thanks in advance!
>
> >       # then filter out spam and unsure stuff....
> >       :0
> >       * ^X-Spambayes-Classification: spam
> >       $HOME/Mail/trained.spam
> >
> >       :0
> >       * ^X-Spambayes-Classification: unsure
> >       $HOME/Mail/unsure
> >
> > I don't see why those macros would cause such a problem, but it
> > has started only since I started using them (of course, I also blew
> > away the ancient hammie db and started over with a small corpus of
> > known ham and spam, at the same time).
> >
> > Prior to that I would just save mis-filed mails in either trained.spam
> > or trained.ham and trust that the nightly retraining would do the right
> > thing.
> >
> > any further ideas?
> >
> > thanks in advance!
> >
> > Fred
> >
> > >
> > >    On Mar 8, 2017 7:11 PM, "Fred Smith" <[1]fredex at fcshome.stoneham.
> ma.us>
> > >    wrote:
> > >
> > >      Hi
> > >      All of a sudden this past week I'm getting this whenever a message
> > >      is
> > >      sent to sb_filter to be retrained:
> > >      File "/usr/bin/sb_filter.py", line 5, in <module>
> > >          pkg_resources.run_script('spambayes==1.1a6', 'sb_filter.py')
> > >        File "/usr/lib/python2.7/site-packages/pkg_resources.py", line
> > >      540, in run_script
> > >          self.require(requires)[0].run_script(script_name, ns)
> > >        File "/usr/lib/python2.7/site-packages/pkg_resources.py", line
> > >      1455, in run_script
> > >          execfile(script_filename, namespace, namespace)
> > >        File "/usr/lib/python2.7/site-packages/spambayes-1.1a6-py2.
> > >      7.egg/EGG-INFO/scripts/sb_filter.py", line 277, in <module>
> > >          main()
> > >        File "/usr/lib/python2.7/site-packages/spambayes-1.1a6-py2.
> > >      7.egg/EGG-INFO/scripts/sb_filter.py", line 268, in main
> > >          action(msg)
> > >        File "/usr/lib/python2.7/site-packages/spambayes-1.1a6-py2.
> > >      7.egg/EGG-INFO/scripts/sb_filter.py", line 186, in filter
> > >          return self.h.filter(msg)
> > >        File "/usr/lib/python2.7/site-packages/spambayes-1.1a6-py2.
> > >      7.egg/spambayes/hammie.py", line 149, in filter
> > >          debug, train)
> > >        File "/usr/lib/python2.7/site-packages/spambayes-1.1a6-py2.
> > >      7.egg/spambayes/hammie.py", line 104, in score_and_filter
> > >          prob, clues = self._scoremsg(msg, True)
> > >        File "/usr/lib/python2.7/site-packages/spambayes-1.1a6-py2.
> > >      7.egg/spambayes/hammie.py", line 33, in _scoremsg
> > >          return self.bayes.spamprob(tokenize(msg), evidence)
> > >        File "/usr/lib/python2.7/site-packages/spambayes-1.1a6-py2.
> > >      7.egg/spambayes/classifier.py", line 169, in chi2_spamprob
> > >          clues = self._getclues(wordstream)
> > >        File "/usr/lib/python2.7/site-packages/spambayes-1.1a6-py2.
> > >      7.egg/spambayes/classifier.py", line 472, in _getclues
> > >          tup = self._worddistanceget(word)
> > >        File "/usr/lib/python2.7/site-packages/spambayes-1.1a6-py2.
> > >      7.egg/spambayes/classifier.py", line 487, in _worddistanceget
> > >          prob = self.probability(record)
> > >        File "/usr/lib/python2.7/site-packages/spambayes-1.1a6-py2.
> > >      7.egg/spambayes/classifier.py", line 287, in probability
> > >          assert hamcount <= nham, "Token seen in more ham than ham
> > >      trained."
> > >      AssertionError: Token seen in more ham than ham trained.
> > >      It is possible I got a python update, but I wasn't paying
> attention,
> > >      so
> > >      I'm not at all sure.
> > >      I'm NOT a python guru, so I'd appreciate any guidance any of you
> can
> > >      provide.
> > >      thanks in advance!
> > >      Fred
> > >      --
> > >      ---- Fred Smith -- [2]fredex at fcshome.stoneham.ma.us
> > >      -----------------------------
> > >                          The Lord detests the way of the wicked
> > >                        but he loves those who pursue righteousness.
> > >      ----------------------------- Proverbs 15:9 (niv)
> > >      -----------------------------
> > >      _______________________________________________
> > >      [3]SpamBayes at python.org
> > >      [4]https://mail.python.org/mailman/listinfo/spambayes
> > >      Info/Unsubscribe: [5]http://mail.python.org/
> > >      mailman/listinfo/spambayes
> > >      Check the FAQ before asking: [6]http://spambayes.sf.net/faq.html
> > >
> > > References
> > >
> > >    1. mailto:fredex at fcshome.stoneham.ma.us
> > >    2. mailto:fredex at fcshome.stoneham.ma.us
> > >    3. mailto:SpamBayes at python.org
> > >    4. https://mail.python.org/mailman/listinfo/
> spambayesInfo/Unsubscribe
> > >    5. http://mail.python.org/mailman/listinfo/spambayes
> > >    6. http://spambayes.sf.net/faq.html
> >
> > --
> > ---- Fred Smith -- fredex at fcshome.stoneham.ma.us
> -----------------------------
> >                         The Lord is like a strong tower.
> >              Those who do what is right can run to him for safety.
> > --------------------------- Proverbs 18:10 (niv)
> -----------------------------
>
> --
> ------------------------------------------------------------
> -------------------
>  .----    Fred Smith   /
> ( /__  ,__.   __   __ /  __   : /
>  /    /  /   /__) /  /  /__) .+'           Home:
> fredex at fcshome.stoneham.ma.us
> /    /  (__ (___ (__(_ (___ / :__
> 781-438-5471
> -------------------------------- Jude 1:24,25
> ---------------------------------
> _______________________________________________
> SpamBayes at python.org
> https://mail.python.org/mailman/listinfo/spambayes
> Info/Unsubscribe: http://mail.python.org/mailman/listinfo/spambayes
> Check the FAQ before asking: http://spambayes.sf.net/faq.html
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/spambayes/attachments/20170310/03085c8e/attachment.html>


More information about the SpamBayes mailing list