From skip at pobox.com Sat Jan 24 22:27:42 2009 From: skip at pobox.com (skip at pobox.com) Date: Sat, 24 Jan 2009 15:27:42 -0600 Subject: [spambayes-dev] sb_filter/lockfile interaction problem In-Reply-To: <873af88mbp.fsf@mcbain.luannocracy.com> References: <873af88mbp.fsf@mcbain.luannocracy.com> Message-ID: <18811.34766.195055.871637@montanaro.dyndns.org> Dave> I'm not missing anything here, am I? No. Probably a copy/paste where from a section of code using get_ident() to one where I wanted to use get_name(). I just registered version 0.7 with PyPI. Please give that a try. From dave at boostpro.com Sat Jan 24 21:51:12 2009 From: dave at boostpro.com (David Abrahams) Date: Sat, 24 Jan 2009 15:51:12 -0500 Subject: [spambayes-dev] Proposed tte extension Message-ID: <87wsck77i7.fsf@mcbain.luannocracy.com> I haven't coded it up yet, but usually when it seems that spambayes' performance is degraded, it means I've misclassified something. These messages tend to stay misclassified when training to exhaustion, even after 9 or 10 rounds. I'm contemplating an extension to tte.py that will throw them back into my unsure folder and start over if training doesn't stop on its own before N rounds due to everything being properly classified). Thoughts? -- Dave Abrahams BoostPro Computing http://www.boostpro.com From dave at boostpro.com Sat Jan 24 23:01:26 2009 From: dave at boostpro.com (David Abrahams) Date: Sat, 24 Jan 2009 17:01:26 -0500 Subject: [spambayes-dev] sb_filter/lockfile interaction problem In-Reply-To: <18811.34766.195055.871637@montanaro.dyndns.org> (skip@pobox.com's message of "Sat, 24 Jan 2009 15:27:42 -0600") References: <873af88mbp.fsf@mcbain.luannocracy.com> <18811.34766.195055.871637@montanaro.dyndns.org> Message-ID: <87wsck5pop.fsf@mcbain.luannocracy.com> on Sat Jan 24 2009, skip-AT-pobox.com wrote: > Dave> I'm not missing anything here, am I? > > No. Probably a copy/paste where from a section of code using get_ident() to > one where I wanted to use get_name(). > > I just registered version 0.7 with PyPI. Please give that a try. works! -- Dave Abrahams BoostPro Computing http://www.boostpro.com From dave at boostpro.com Sun Jan 25 00:45:57 2009 From: dave at boostpro.com (David Abrahams) Date: Sat, 24 Jan 2009 18:45:57 -0500 Subject: [spambayes-dev] Broken reference to oe_mailbox? Message-ID: <87k58k5kui.fsf@mcbain.luannocracy.com> $ sb_imapfilter.py opened existing cache with 238 A records and 0 PTR records Traceback (most recent call last): File "/usr/local/bin/sb_imapfilter.py", line 5, in pkg_resources.run_script('spambayes==1.1b1', 'sb_imapfilter.py') File "build/bdist.freebsd-6.2-RELEASE-amd64/egg/pkg_resources.py", line 448, in run_script File "build/bdist.freebsd-6.2-RELEASE-amd64/egg/pkg_resources.py", line 1166, in run_script File "/usr/local/lib/python2.5/site-packages/spambayes-1.1b1-py2.5.egg/EGG-INFO/scripts/sb_imapfilter.py", line 138, in from spambayes.UserInterface import UserInterfaceServer File "/usr/local/lib/python2.5/site-packages/spambayes-1.1b1-py2.5.egg/spambayes/UserInterface.py", line 78, in from spambayes import oe_mailbox File "/usr/local/lib/python2.5/site-packages/spambayes-1.1b1-py2.5.egg/spambayes/oe_mailbox.py", line 53, in from spambayes import oe_mailbox ImportError: cannot import name oe_mailbox -- Dave Abrahams BoostPro Computing http://www.boostpro.com From dave at boostpro.com Sun Jan 25 02:41:00 2009 From: dave at boostpro.com (David Abrahams) Date: Sat, 24 Jan 2009 20:41:00 -0500 Subject: [spambayes-dev] Broken reference to oe_mailbox? References: <87k58k5kui.fsf@mcbain.luannocracy.com> Message-ID: <87eiys5fir.fsf@mcbain.luannocracy.com> on Sat Jan 24 2009, David Abrahams wrote: > $ sb_imapfilter.py > opened existing cache with 238 A records and 0 PTR records > Traceback (most recent call last): > File "/usr/local/bin/sb_imapfilter.py", line 5, in > pkg_resources.run_script('spambayes==1.1b1', 'sb_imapfilter.py') > File "build/bdist.freebsd-6.2-RELEASE-amd64/egg/pkg_resources.py", line 448, in > run_script > File "build/bdist.freebsd-6.2-RELEASE-amd64/egg/pkg_resources.py", line 1166, in > run_script > File > "/usr/local/lib/python2.5/site-packages/spambayes-1.1b1-py2.5.egg/EGG-INFO/scripts/sb_imapfilter.py", > line 138, in > from spambayes.UserInterface import UserInterfaceServer > File > "/usr/local/lib/python2.5/site-packages/spambayes-1.1b1-py2.5.egg/spambayes/UserInterface.py", > line 78, in > from spambayes import oe_mailbox > File > "/usr/local/lib/python2.5/site-packages/spambayes-1.1b1-py2.5.egg/spambayes/oe_mailbox.py", > line 53, in > from spambayes import oe_mailbox > ImportError: cannot import name oe_mailbox Looks to me like line 43 of oe_mailbox.py tries to load itself, and just needs to be deleted. I'm very grateful for everything that the SpamBayes project has done for me, but finding these two things today does make me wonder whether the procedure for checkins and/or testing could be improved. -- Dave Abrahams BoostPro Computing http://www.boostpro.com From skip at pobox.com Sun Jan 25 16:22:06 2009 From: skip at pobox.com (skip at pobox.com) Date: Sun, 25 Jan 2009 09:22:06 -0600 Subject: [spambayes-dev] Broken reference to oe_mailbox? In-Reply-To: <87eiys5fir.fsf@mcbain.luannocracy.com> References: <87k58k5kui.fsf@mcbain.luannocracy.com> <87eiys5fir.fsf@mcbain.luannocracy.com> Message-ID: <18812.33694.788024.829055@montanaro.dyndns.org> Dave> I'm very grateful for everything that the SpamBayes project has Dave> done for me, but finding these two things today does make me Dave> wonder whether the procedure for checkins and/or testing could be Dave> improved. At this point SpamBayes is a fairly large collection of separate applications. Most people use just one, the Outlook plugin. Almost nobody uses more than one. Writing test cases to properly test the POP3 proxy, IMAP filter and the Notes filter (should that simply be deleted?) would require a fair amount of scaffolding. At minimum you'd need stub implementations of the various kinds of protocol servers. The core classifier should be fairly well-tested by the daily use it gets. It was easier to assume that bugs got squashed quickly when there were several active developers who could operate from the Subversion repository. With basically just me as the lone person making any code changes and running from Subversion there is much less likelihood of flushing out dumb bugs. That said, I agree that SpamBayes needs some formal test framework. Skip From dave at boostpro.com Sun Jan 25 16:34:44 2009 From: dave at boostpro.com (David Abrahams) Date: Sun, 25 Jan 2009 10:34:44 -0500 Subject: [spambayes-dev] Broken reference to oe_mailbox? In-Reply-To: <18812.33694.788024.829055@montanaro.dyndns.org> (skip@pobox.com's message of "Sun, 25 Jan 2009 09:22:06 -0600") References: <87k58k5kui.fsf@mcbain.luannocracy.com> <87eiys5fir.fsf@mcbain.luannocracy.com> <18812.33694.788024.829055@montanaro.dyndns.org> Message-ID: <87ljszflgr.fsf@mcbain.luannocracy.com> on Sun Jan 25 2009, skip-AT-pobox.com wrote: > Dave> I'm very grateful for everything that the SpamBayes project has > Dave> done for me, but finding these two things today does make me > Dave> wonder whether the procedure for checkins and/or testing could be > Dave> improved. > > At this point SpamBayes is a fairly large collection of separate > applications. Most people use just one, the Outlook plugin. Almost nobody > uses more than one. Writing test cases to properly test the POP3 proxy, > IMAP filter and the Notes filter (should that simply be deleted?) would > require a fair amount of scaffolding. At minimum you'd need stub > implementations of the various kinds of protocol servers. You could always test them with real servers. It's imperfect but workable. > The core classifier should be fairly well-tested by the daily use it > gets. Except that the %x bug somehow got loose. > It was easier to assume that bugs got squashed quickly when there were > several active developers who could operate from the Subversion > repository. With basically just me as the lone person making any code > changes and running from Subversion there is much less likelihood of > flushing out dumb bugs. If I'd had commit access I'd have been able to nail those two. Probably wouldn't have stopped me from posting as well, though. :-) My SF userid is david_abrahams, should you decide to add me. I can think of one other person who might also be willing to help. > That said, I agree that SpamBayes needs some formal test framework. Well, you could start by just trying to load all the modules ;-) -- Dave Abrahams BoostPro Computing http://www.boostpro.com From dave at boostpro.com Sun Jan 25 20:44:17 2009 From: dave at boostpro.com (David Abrahams) Date: Sun, 25 Jan 2009 14:44:17 -0500 Subject: [spambayes-dev] Updated Server-Side Recipe Message-ID: <877i4jdvce.fsf@mcbain.luannocracy.com> Hi, I've published an updated guide to doing IMAP-Server-Side spam filtering (my way) at http://wiki.github.com/techarcana/server-side-spambayes. You might want to remove the reference to http://www.boost-consulting.com/writing/server-side.html from http://spambayes.sourceforge.net/server_side.html and point to the github site instead. Incidentally, I don't see a way to navigate to http://spambayes.sourceforge.net/server_side.html from the SB home page. -- Dave Abrahams BoostPro Computing http://www.boostpro.com From skip at pobox.com Tue Jan 27 04:08:56 2009 From: skip at pobox.com (skip at pobox.com) Date: Mon, 26 Jan 2009 21:08:56 -0600 Subject: [spambayes-dev] Broken reference to oe_mailbox? In-Reply-To: <87ljszflgr.fsf@mcbain.luannocracy.com> References: <87k58k5kui.fsf@mcbain.luannocracy.com> <87eiys5fir.fsf@mcbain.luannocracy.com> <18812.33694.788024.829055@montanaro.dyndns.org> <87ljszflgr.fsf@mcbain.luannocracy.com> Message-ID: <18814.31432.144920.584818@montanaro.dyndns.org> Dave> You could always test them with real servers. It's imperfect but Dave> workable. Richie Hindle contacted me offline and indicated that there are some server-based tests. They've not been run in ages, so there are currently tons of failures. Rather than inundate the list with the current massively failing output of BAYESCUSTOMIZE= nosetests . I stuck it on my website: http://smontanaro.dyndns.org/python/failing-sb-tests.txt Hopefully some of the failures are shallow bugs, but I never got involved with the imap filter and pop3 proxy so I'm not sure. >> The core classifier should be fairly well-tested by the daily use it >> gets. Dave> Except that the %x bug somehow got loose. Yeah, haven't figured that out out yet. >> It was easier to assume that bugs got squashed quickly when there >> were several active developers who could operate from the Subversion >> repository. With basically just me as the lone person making any >> code changes and running from Subversion there is much less >> likelihood of flushing out dumb bugs. Dave> If I'd had commit access I'd have been able to nail those two. Dave> Probably wouldn't have stopped me from posting as well, Dave> though. :-) Dave> My SF userid is david_abrahams, should you decide to add me. I Dave> can think of one other person who might also be willing to help. Done. Welcome to the pool. >> That said, I agree that SpamBayes needs some formal test framework. Dave> Well, you could start by just trying to load all the modules ;-) Checked in as .../spambayes/test/test_basic_import.py I also have a truckload of small modifications in my sandbox which I hope to start checking in in the next few days/weeks. I believe all (or almost all) are related to dumping Python 2.2 support and creating a central module to gather version- and platform-dependent imports. Skip From skip at pobox.com Tue Jan 27 12:32:00 2009 From: skip at pobox.com (skip at pobox.com) Date: Tue, 27 Jan 2009 05:32:00 -0600 (CST) Subject: [spambayes-dev] Big old checkin - Hello Python 2.4, Goodbye Python 2.[23] Message-ID: <20090127113200.961E1D82949@montanaro.dyndns.org> If you have a chance (even just a few minutes) and are comfortable with running or at least testing SpamBayes from Subversion it would help if you could svn up and try the current state of affairs. You *will* need Python 2.4 or later. I run routinely with Python 2.7a0. I just ripped out all code which attempted to support Python 2.2 and 2.3. All sorts of stuff was previously not assumed to exist: textwrap, cStringIO, __file__, True/False, set, heapq, csv, enumerate, reversed Those are all assumed to just be there now. Support for Berkeley DB v1.85 databases is completely gone. If you were still using it you'll have to choose something else and retrain from scratch. Finally, and perhaps most significant, I checked in a new file at the top level, failing-unit-tests.txt. It's the output of me running BAYESCUSTOMIZE= nosetests --verbose . 2>&1 | tee failing-sb-tests.txt THere is lots of work to do to get this stuff passing. Lots of file cleanup as well, but that's secondary. If you an see how to make any failing tests pass that would be a huge boon. Of course, adding new test cases is always welcome. Skip