From nas at python.ca Mon Jan 6 16:07:15 2003 From: nas at python.ca (Neil Schemenauer) Date: Mon Jan 6 19:03:15 2003 Subject: [Spambayes] Two amusing spam clues Message-ID: <20030107000715.GA19250@glacier.arctrix.com> For an email I just received: 'header:Reply-To:1' 0.815893146633 'message-id:@murphy.debian.org' 0.997094899935 The first one surprised me. It looks like most spam provides a reply-to header that is the same as the from header. I have not idea why they do that. The second one is spammy because I get a fair amount of spam through my debian.org address. I guess a lot of spam doesn't have a message ID so the Debian mail server adds one. The moral of the story is that statistical filters are good at picking up on clues that humans might miss. I don't about other people on this list but my spambayes filter is kicking spammer ass. I very rarely see a FN and even more rarely see a FP. Props to everyone who helped out with development. Neil From richie at entrian.com Tue Jan 7 10:49:48 2003 From: richie at entrian.com (richie@entrian.com) Date: Tue Jan 7 05:50:23 2003 Subject: [Spambayes] Two amusing spam clues In-Reply-To: <20030107000715.GA19250@glacier.arctrix.com> Message-ID: [Neil] > I don't about other people on this > list but my spambayes filter is kicking spammer ass. I very rarely see > a FN and even more rarely see a FP. Props to everyone who helped out > with development. Me too. I came back from the Christmas break to find 315 messages waiting for me. My spambayes system had only been trained on 900 messages, but correctly classified 262 spams and 48 hams, and was unsure about just 5 messages (2 hams, 3 spams). No FPs, no FNs. Very impressive. I'm now working on pulling the HTML code out from pop3proxy.py (where all the HTML is mixed in with the Python) into a separate HTML file, which is viewable and editable, and making the web UI pull the HTML components out of there at run time. That should let me unify the pop3proxy web UI with Tim Stone's OptionsConfig.py, and should let John Draper integrate his proposed web-based Spam Management System much more easily. I also have a fix for the memory-usage problems posted by Rob B back in December - I'll check that in with the HTML edits. -- Richie Hindle richie@entrian.com From skip at pobox.com Tue Jan 7 09:08:06 2003 From: skip at pobox.com (Skip Montanaro) Date: Tue Jan 7 10:54:30 2003 Subject: [Spambayes] Two amusing spam clues In-Reply-To: <20030107000715.GA19250@glacier.arctrix.com> References: <20030107000715.GA19250@glacier.arctrix.com> Message-ID: <15898.60758.122222.194424@montanaro.dyndns.org> Neil> 'message-id:@murphy.debian.org' 0.997094899935 Neil> I guess a lot of spam doesn't have a message ID so the Debian mail Neil> server adds one. Yeah, seems rather odd, doesn't it? It's not like they are hard to generated or anything. For me message-id:@pobox.com and message-id:@manatee.mojam.com are killer clues for the same reason. Neil> I don't about other people on this list but my spambayes filter is Neil> kicking spammer ass. I very rarely see a FN and even more rarely Neil> see a FP. Props to everyone who helped out with development. Same here on all accounts. I have fairly conservative cutoffs (0.25 and 0.8), so I get a handful of unsures per day. I can't remember the last time I got a false positive. I haven't trained in awhile either. The last time I trained was before Christmas. I have mods to pop3proxy to allow startup of antoher program before making connections (this allows you to do things like tunnel pop3 over ssh), but I've been too chicken to try it out, fearing I'd lose email. I'll get to it one of these days and then check in the changes. Skip From lists at morpheus.demon.co.uk Tue Jan 7 20:18:46 2003 From: lists at morpheus.demon.co.uk (Paul Moore) Date: Tue Jan 7 15:46:38 2003 Subject: [Spambayes] Two amusing spam clues References: <20030107000715.GA19250@glacier.arctrix.com> Message-ID: Neil Schemenauer writes: > I very rarely see a FN and even more rarely see a FP. Props to > everyone who helped out with development. I agree entirely. These days, spam simply isn't anything like the problem it used to be, and that's entirely down to spambayes. Paul. -- This signature intentionally left blank From lists at morpheus.demon.co.uk Tue Jan 7 20:28:47 2003 From: lists at morpheus.demon.co.uk (Paul Moore) Date: Tue Jan 7 15:46:39 2003 Subject: [Spambayes] Outlook addin is slow shutting down Message-ID: I've noticed that these days, Outlook is very slow in shutting down, sometimes taking 2 or 3 minutes after the UI has gone before the process terminates. This is starting to be a problem for me, as my end of day routine is to shut down Outlook, then the PC. I now have to wait and check that Outlook has really gone before I start shutting down the PC (I don't want my spambayes database corrupted because I shut the PC down before the pickle was written out). I assume that the delay is caused by the large pickle getting written to disk. (Can I check this assumption in any way?) Is that probable, and if so is anyone looking at addressing the issue? I'm not sure what might be done - we can't easily switch to DBM files without hitting the issue that the Windows distribution of Python 2.2 doesn't have a (non-broken) DBM alternative, and I don't think we'd want the Outlook client to gain a dependency on bsddb :-( Any thoughts? Paul. -- This signature intentionally left blank From rob at hooft.net Tue Jan 7 09:18:44 2003 From: rob at hooft.net (Rob W.W. Hooft) Date: Wed Jan 8 00:37:25 2003 Subject: [Spambayes] Two amusing spam clues References: <20030107000715.GA19250@glacier.arctrix.com> Message-ID: <3E1A8D64.40107@hooft.net> Neil Schemenauer wrote: > For an email I just received: > > 'header:Reply-To:1' 0.815893146633 > 'message-id:@murphy.debian.org' 0.997094899935 > > The first one surprised me. It looks like most spam provides a reply-to > header that is the same as the from header. I have not idea why they do > that. The second one is spammy because I get a fair amount of spam > through my debian.org address. I guess a lot of spam doesn't have a > message ID so the Debian mail server adds one. > > The moral of the story is that statistical filters are good at picking > up on clues that humans might miss. I don't about other people on this > list but my spambayes filter is kicking spammer ass. I very rarely see > a FN and even more rarely see a FP. Props to everyone who helped out > with development. I just retrained on the latest batch for me: 448 messages classified as ham, 3 of these were fn. 122 messages classified as spam, no fp. 18 messages classified as unsure, all of these were spam. I could have reduced the number of unsures to 11 retrospectively by using the default spam cutoff of 0.90. Lowest scoring spam: 0.11, highest scoring ham: 0.01 spambayes really is very good now. If someone could find time, we should make a release! It is awfully quiet on this list lately.... Rob -- Rob W.W. Hooft || rob@hooft.net || http://www.hooft.net/people/rob/ From diana at uv-ray.com Wed Jan 8 17:33:55 2003 From: diana at uv-ray.com (Diana Revencu) Date: Wed Jan 8 12:49:26 2003 Subject: [Spambayes] Message-ID: <012101c2b72b$5ba40db0$0100a8c0@home> Dear Sirs, I was having a look over your anti-spam resources, very nice! We recently introduced a spam filter, It is available at http://www.spambully.com. Spam Bully utilizes a Bayesian Filter, Confirmation Messages, can bounce known spams and friend/spammer lists. We would be very grateful if you would link to us. We can provide a link back to your site in our news section we are developing. We provide information on the latest developments in spam. http://www.spambully.com/news/ Thank you, Diana diana@spambully.com From anthony at interlink.com.au Thu Jan 9 13:13:03 2003 From: anthony at interlink.com.au (Anthony Baxter) Date: Wed Jan 8 21:14:14 2003 Subject: [Spambayes] In-Reply-To: <012101c2b72b$5ba40db0$0100a8c0@home> Message-ID: <200301090213.h092D4F14953@localhost.localdomain> An entry has been added to the 'related projects' page on the website. From francois.granger at free.fr Thu Jan 9 11:24:32 2003 From: francois.granger at free.fr (Fran=?ISO-8859-1?B?5w==?=ois Granger) Date: Thu Jan 9 05:24:37 2003 Subject: [Spambayes] In-Reply-To: <200301090213.h092D4F14953@localhost.localdomain> Message-ID: on 9/01/03 3:13, Anthony Baxter at anthony@interlink.com.au wrote: > An entry has been added to the 'related projects' page on the website. On page http://spambayes.sourceforge.net/applications.html Under title Hammie.py, you could replace >Currently documentation focusses on Unix. By >Currently documentation focusses on Unix. Works on MacOS X as well Under title pop3proxy. Py, you could replace >Should work on windows/unix/whatever... ? By >Should work on windows/unix/MacOS 9/whatever... ? -- Le courrier est un moyen de communication. Les gens devraient se poser des questions sur les implications politiques des choix (ou non choix) de leurs outils et technologies. Pour des courriers propres : -- From Paul.Moore at atosorigin.com Thu Jan 9 13:48:30 2003 From: Paul.Moore at atosorigin.com (Moore, Paul) Date: Thu Jan 9 08:49:13 2003 Subject: [Spambayes] Outlook client - addin.py revision 1.43 broke Outlook+MS Exchange Message-ID: <16E1010E4581B049ABC51D4975CEDB886199A1@UKDCX001.uk.int.atosorigin.com> I just upgraded Spambayes to latest CVS, and it broke my Outlook setup. I use Outlook plus MS Exchange, and with the current CVS version (addin.py revision 1.43), when I select "Open other user's calendar" I get an immediate crash (GPF, trying to read from address 0) in Outlook. I reverted just addin.py back to revision 1.42, and the crash no longer occurs. I have no idea what in the changes might have caused this to happen. The "open other user's folder" menu item does create a new Outlook window, and that window (with the old addin.py) has a "Delete as spam" button, which doesn't make sense for a calendar entry, much less for someone else's calendar. So maybe that's relevant (I could try clicking the button, but as I'm writing this mail in Outlook, a crash would lose it, so I won't for now :-)) Sorry I can't give any more clues - if you want me to do any testing, just ask. Paul. From tdickenson at devmail.geminidataloggers.co.uk Thu Jan 9 21:07:59 2003 From: tdickenson at devmail.geminidataloggers.co.uk (Toby Dickenson) Date: Thu Jan 9 17:37:07 2003 Subject: [Spambayes] Two amusing spam clues In-Reply-To: <15898.60758.122222.194424@montanaro.dyndns.org> References: <20030107000715.GA19250@glacier.arctrix.com> <15898.60758.122222.194424@montanaro.dyndns.org> Message-ID: <200301092107.59382.tdickenson@devmail.geminidataloggers.co.uk> On Tuesday 07 January 2003 3:08 pm, Skip Montanaro wrote: > Neil> I guess a lot of spam doesn't have a message ID so the Debian > Neil> mail server adds one. > > Yeah, seems rather odd, doesn't it? It's not like they are hard to > generated or anything. For me message-id:@pobox.com and > message-id:@manatee.mojam.com are killer clues for the same reason. I am seeing a high proportion of spams coming through our secondary MX, so message-id:@charon.geminidataloggers.com is a suprising spam clue for me too. > Neil> I don't about other people on this list but my spambayes filter > Neil> is kicking spammer ass. I very rarely see a FN and even more rarely > Neil> see a FP. Props to everyone who helped out with development. > > Same here on all accounts. I have fairly conservative cutoffs (0.25 and > 0.8), so I get a handful of unsures per day. I can't remember the last > time I got a false positive. I have only had one Unsure in the last few weeks, and it had some interesting characteristics. The first half looked very spammy, but the second half was a list of three, four, and five digit numbers. Apparently numbers are a strong ham clue for me. Particularly 336, 603, and 320. From rbyrnes at ozemail.com.au Fri Jan 10 10:04:59 2003 From: rbyrnes at ozemail.com.au (Rob B) Date: Thu Jan 9 18:06:06 2003 Subject: [Spambayes] Two amusing spam clues In-Reply-To: References: <20030107000715.GA19250@glacier.arctrix.com> Message-ID: <5.1.1.6.2.20030110100056.01cf80d0@127.0.0.1> At 21:49 7/01/2003, richie@entrian.com sent this up the stick: >I also have a fix for the memory-usage problems posted by Rob B back in >December - I'll check that in with the HTML edits. The 250-message limit "fix" posted to CVS (v1.8) on Jan 5 seemed to work. cheer, Rob -- Let a fool hold his tongue and he will pass for a sage. This is random quote 775 of a collection of 1273 Distance from the centre of the brewing universe: [15200.8 km (8207.8 mi), 262.8 deg](Apparent) Rennerian Public Key fingerprint = 6219 33BD A37B 368D 29F5 19FB 945D C4D7 1F66 D9C5 From piersh at friskit.com Thu Jan 9 15:22:35 2003 From: piersh at friskit.com (Piers Haken) Date: Thu Jan 9 18:08:09 2003 Subject: [Spambayes] Outlook addin is slow shutting down Message-ID: <9891913C5BFE87429D71E37F08210CB9297535@zeus.sfhq.friskit.com> You're right, it's the saving of the pickle. If you run pythonwin's debug output window then you'll see the diagnostic messages telling you what's going on. I wonder, would it be possible to use MSDE (a single-user version of SQL Server), which ships with office, for the Outlook plugin? Piers. -----Original Message----- From: Paul Moore [mailto:lists@morpheus.demon.co.uk] Sent: Tuesday, January 07, 2003 12:29 PM To: spambayes@python.org Subject: [Spambayes] Outlook addin is slow shutting down I've noticed that these days, Outlook is very slow in shutting down, sometimes taking 2 or 3 minutes after the UI has gone before the process terminates. This is starting to be a problem for me, as my end of day routine is to shut down Outlook, then the PC. I now have to wait and check that Outlook has really gone before I start shutting down the PC (I don't want my spambayes database corrupted because I shut the PC down before the pickle was written out). I assume that the delay is caused by the large pickle getting written to disk. (Can I check this assumption in any way?) Is that probable, and if so is anyone looking at addressing the issue? I'm not sure what might be done - we can't easily switch to DBM files without hitting the issue that the Windows distribution of Python 2.2 doesn't have a (non-broken) DBM alternative, and I don't think we'd want the Outlook client to gain a dependency on bsddb :-( Any thoughts? Paul. -- This signature intentionally left blank _______________________________________________ Spambayes mailing list Spambayes@python.org http://mail.python.org/mailman/listinfo/spambayes From mhammond at skippinet.com.au Fri Jan 10 11:35:55 2003 From: mhammond at skippinet.com.au (Mark Hammond) Date: Thu Jan 9 19:36:43 2003 Subject: [Spambayes] RE: Outlook client - addin.py revision 1.43 broke Outlook+MS Exchange In-Reply-To: <16E1010E4581B049ABC51D4975CEDB886199A1@UKDCX001.uk.int.atosorigin.com> Message-ID: <000301c2b840$3d23a880$530f8490@eden> > I just upgraded Spambayes to latest CVS, and it broke my > Outlook setup. I > use Outlook plus MS Exchange, and with the current CVS > version (addin.py > revision 1.43), when I select "Open other user's calendar" I get an > immediate crash (GPF, trying to read from address 0) in > Outlook. I reverted > just addin.py back to revision 1.42, and the crash no longer occurs. I hope I just checked in a fix for this. It seems to happen whenever you select "Open in new window" for *any* Outlook item. Mark. -------------- next part -------------- A non-text attachment was scrubbed... Name: winmail.dat Type: application/ms-tnef Size: 2456 bytes Desc: not available Url : http://mail.python.org/pipermail/spambayes/attachments/20030110/93ba98ce/winmail.bin From anthony at interlink.com.au Fri Jan 10 20:09:15 2003 From: anthony at interlink.com.au (Anthony Baxter) Date: Fri Jan 10 04:10:34 2003 Subject: [Spambayes] re-org - making a package &c. Message-ID: <200301100909.h0A99G403099@localhost.localdomain> I'm just making a reorg-branch now in CVS - I'm going to move the library code into a subdirectory 'spambayes', and then adjust things. There may be some disruption :) but this should then allow us to actually package stuff up and release things. If anyone wants to help, let me know... From Paul.Moore at atosorigin.com Fri Jan 10 09:19:35 2003 From: Paul.Moore at atosorigin.com (Moore, Paul) Date: Fri Jan 10 04:20:10 2003 Subject: [Spambayes] RE: Outlook client - addin.py revision 1.43 broke Outlook+MS Exchange Message-ID: <16E1010E4581B049ABC51D4975CEDB880113D82C@UKDCX001.uk.int.atosorigin.com> From: Mark Hammond [mailto:mhammond@skippinet.com.au] > I hope I just checked in a fix for this. It seems to happen > whenever you select "Open in new window" for *any* Outlook item. Yes, that fixed it. Thanks for the extremely quick response! Paul. From anthony at interlink.com.au Fri Jan 10 22:08:25 2003 From: anthony at interlink.com.au (Anthony Baxter) Date: Fri Jan 10 06:09:33 2003 Subject: [Spambayes] re-org - making a package &c. In-Reply-To: <200301100909.h0A99G403099@localhost.localdomain> Message-ID: <200301101108.h0AB8Q304706@localhost.localdomain> I should probably add that once this is done, and bedded down, I'd like to propose that we make a real release - I'm thinking we do one release that's hammie and pop3proxy, and another that's the Outlook plugin. It's not like we're still tracking a moving target here, and there's no reason we can't make this a lot easier for people than "get the CVS" :) -- Anthony Baxter It's never too late to have a happy childhood. From msergeant at startechgroup.co.uk Fri Jan 10 11:15:00 2003 From: msergeant at startechgroup.co.uk (Matt Sergeant) Date: Fri Jan 10 06:13:13 2003 Subject: [Spambayes] Two amusing spam clues In-Reply-To: <200301092107.59382.tdickenson@devmail.geminidataloggers.co.uk> References: <20030107000715.GA19250@glacier.arctrix.com> <15898.60758.122222.194424@montanaro.dyndns.org> <200301092107.59382.tdickenson@devmail.geminidataloggers.co.uk> Message-ID: <1042197300.30555.87.camel@felony.int.star.co.uk> On Thu, 2003-01-09 at 21:07, Toby Dickenson wrote: > On Tuesday 07 January 2003 3:08 pm, Skip Montanaro wrote: > > Neil> I guess a lot of spam doesn't have a message ID so the Debian > > Neil> mail server adds one. > > > > Yeah, seems rather odd, doesn't it? It's not like they are hard to > > generated or anything. For me message-id:@pobox.com and > > message-id:@manatee.mojam.com are killer clues for the same reason. > > I am seeing a high proportion of spams coming through our secondary MX, so > message-id:@charon.geminidataloggers.com is a suprising spam clue for me too. This is to bypass postini, who's default setting is to set three (I think) MX records: two of their servers at high priority and finally yours at low priority in case postini's servers are down. So spammers are starting to just choose the lowest priority MX server. Matt. From anthony at interlink.com.au Fri Jan 10 22:18:11 2003 From: anthony at interlink.com.au (Anthony Baxter) Date: Fri Jan 10 06:19:23 2003 Subject: [Spambayes] Two amusing spam clues In-Reply-To: <1042197300.30555.87.camel@felony.int.star.co.uk> Message-ID: <200301101118.h0ABIBl04824@localhost.localdomain> >>> Matt Sergeant wrote > This is to bypass postini, who's default setting is to set three (I > think) MX records: two of their servers at high priority and finally > yours at low priority in case postini's servers are down. > > So spammers are starting to just choose the lowest priority MX server. Excellent. I know many places that have their systems set up with multiple MXs, the lowest is usually the hideously overloaded mail server that supplies their bandwidth. These machines can sometimes get hours and hours behind, as they're generally not over-burdened with spare cycles. So now they'll get even more spam shite to deal with. wonderful :/ -- Anthony Baxter It's never too late to have a happy childhood. From Paul.Moore at atosorigin.com Fri Jan 10 11:19:08 2003 From: Paul.Moore at atosorigin.com (Moore, Paul) Date: Fri Jan 10 06:19:54 2003 Subject: [Spambayes] re-org - making a package &c. Message-ID: <16E1010E4581B049ABC51D4975CEDB880113D82E@UKDCX001.uk.int.atosorigin.com> From: Anthony Baxter [mailto:anthony@interlink.com.au] > I should probably add that once this is done, and bedded > down, I'd like to propose that we make a real release - I'm > thinking we do one release that's hammie and pop3proxy, and > another that's the Outlook plugin. +1 Paul From tim at fourstonesExpressions.com Fri Jan 10 08:25:01 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Fri Jan 10 09:51:43 2003 Subject: [Spambayes] re-org - making a package &c. In-Reply-To: <16E1010E4581B049ABC51D4975CEDB880113D82E@UKDCX001.uk.int.atosorigin.com> Message-ID: You might want to tag the current tree... - TimS 1/10/2003 5:19:08 AM, "Moore, Paul" wrote: >From: Anthony Baxter [mailto:anthony@interlink.com.au] >> I should probably add that once this is done, and bedded >> down, I'd like to propose that we make a real release - I'm >> thinking we do one release that's hammie and pop3proxy, and >> another that's the Outlook plugin. > >+1 > >Paul > >_______________________________________________ >Spambayes mailing list >Spambayes@python.org >http://mail.python.org/mailman/listinfo/spambayes > > c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org From papaDoc at videotron.ca Fri Jan 10 09:29:32 2003 From: papaDoc at videotron.ca (papaDoc) Date: Fri Jan 10 09:51:43 2003 Subject: [Spambayes] re-org - making a package &c. In-Reply-To: <200301101108.h0AB8Q304706@localhost.localdomain> References: <200301101108.h0AB8Q304706@localhost.localdomain> Message-ID: <3E1ED8CC.1000606@videotron.ca> Hi Anthony, >I should probably add that once this is done, and bedded down, I'd >like to propose that we make a real release - I'm thinking we do one >release that's hammie and pop3proxy, and another that's the Outlook >plugin. > >It's not like we're still tracking a moving target here, and there's >no reason we can't make this a lot easier for people than "get the CVS" :) > > I wrote some documentation for pop3proxy (sometime in november) and summitted to the list. Francois Granger added stuff for Mac OS X. I'm still doing a rewritting and reformatting but it should be available soon. papaDoc From whisper at oz.net Fri Jan 10 10:56:33 2003 From: whisper at oz.net (David LeBlanc) Date: Fri Jan 10 14:03:50 2003 Subject: [Spambayes] re-org - making a package &c. In-Reply-To: Message-ID: Why split it up? David LeBlanc Seattle, WA USA > -----Original Message----- > From: spambayes-bounces@python.org > [mailto:spambayes-bounces@python.org]On Behalf Of Tim Stone - Four > Stones Expressions > Sent: Friday, January 10, 2003 6:25 > To: Anthony Baxter; Moore, Paul > Cc: spambayes@python.org > Subject: Re: RE: [Spambayes] re-org - making a package &c. > > > You might want to tag the current tree... > > - TimS > > 1/10/2003 5:19:08 AM, "Moore, Paul" wrote: > > >From: Anthony Baxter [mailto:anthony@interlink.com.au] > >> I should probably add that once this is done, and bedded > >> down, I'd like to propose that we make a real release - I'm > >> thinking we do one release that's hammie and pop3proxy, and > >> another that's the Outlook plugin. > > > >+1 > > > >Paul > > From anthony at interlink.com.au Sat Jan 11 15:18:03 2003 From: anthony at interlink.com.au (Anthony Baxter) Date: Fri Jan 10 23:19:08 2003 Subject: [Spambayes] re-org - making a package &c. In-Reply-To: Message-ID: <200301110418.h0B4I4V23348@localhost.localdomain> >>> "David LeBlanc" wrote > Why split it up? I'm not sure what 'it' means in this context - if you mean 'outlook plugin' and 'pop3proxy/hammie', well, they're completely applications, and it's unlikely that it would be of use to everyone. On the other hand, it's possibly better to have 3 packages - the base "spambayes" one, the outlook plugin, and the pop3proxy/hammie package. Not sure. If you mean "why reorganise the files into directories - well, as it is now, we install a large pile of packages with _very_ generic names into site-packages. This is ungood. -- Anthony Baxter It's never too late to have a happy childhood. From whisper at oz.net Fri Jan 10 20:45:53 2003 From: whisper at oz.net (David LeBlanc) Date: Fri Jan 10 23:46:06 2003 Subject: [Spambayes] re-org - making a package &c. In-Reply-To: <200301110418.h0B4I4V23348@localhost.localdomain> Message-ID: I thought that the pop3proxy was needed for the outlook application, thus the "why split" question. As for putting things in directories, I heartily agree/approve - people who release things that put things _in_ site-packages ought to be shot ;) David LeBlanc Seattle, WA USA > -----Original Message----- > From: Anthony Baxter [mailto:anthony@interlink.com.au] > Sent: Friday, January 10, 2003 20:18 > To: David LeBlanc > Cc: spambayes@python.org > Subject: Re: [Spambayes] re-org - making a package &c. > > > > >>> "David LeBlanc" wrote > > Why split it up? > > I'm not sure what 'it' means in this context - if you mean > 'outlook plugin' > and 'pop3proxy/hammie', well, they're completely applications, and it's > unlikely that it would be of use to everyone. On the other hand, it's > possibly better to have 3 packages - the base "spambayes" one, the outlook > plugin, and the pop3proxy/hammie package. Not sure. > > If you mean "why reorganise the files into directories - well, as it is > now, we install a large pile of packages with _very_ generic names into > site-packages. This is ungood. > > > -- > Anthony Baxter > It's never too late to have a happy childhood. > From mhammond at skippinet.com.au Sun Jan 12 23:34:21 2003 From: mhammond at skippinet.com.au (Mark Hammond) Date: Sun Jan 12 07:35:01 2003 Subject: [Spambayes] re-org - making a package &c. In-Reply-To: Message-ID: <05d801c2ba36$ef200a90$530f8490@eden> [David LeBlanc] > I thought that the pop3proxy was needed for the outlook > application, thus the "why split" question. Nope - Outlook just needs the engine. I don't see a need to make a release of the core engine - it is always available from CVS, and anyone who wants just the engine and no applicaiton is going to arrange for that without problem. I fully support the directory splits, though. I would also like to see a "test" directory. For various reasons, I don't think we are ready for a "binary" distribution of this stuff yet - but making the first release "python source only" may appeal in terms of limiting the set of initial users to a fairly literate and sympathetic audience willing to offer valuable feedback. Especially valuable will be the initial training experiences given many people wont have been collecting spam when they first install this. Tim's experiments implied that without at least a few spam to start with, you better be sympathetic for a while! Still-working-on-the-stand-alone-DLL-tho-ly, Mark. From anthony at interlink.com.au Mon Jan 13 15:56:56 2003 From: anthony at interlink.com.au (Anthony Baxter) Date: Sun Jan 12 23:58:17 2003 Subject: [Spambayes] Re: [Spambayes-checkins] website background.ht,1.4,1.5 style.css,1.2,1.3 In-Reply-To: Message-ID: <200301130456.h0D4uuQ15598@localhost.localdomain> >>> "Anthony Baxter" wrote > Update of /cvsroot/spambayes/website > In directory sc8-pr-cvs1:/tmp/cvs-serv17660 > > Modified Files: > background.ht style.css > Log Message: > updated background with some sample plots. If someone in the set of > (Tim, Gary, Rob) could review this and point out the obvious stupids, > that would be good. (Or anyone else who understands the math...) Could someone who understands the math of this stuff please read over the 'background' page and point out the mistakes? Also, if someone has a pointer to something that explains chi-squared in words that don't include phrases like "confluent hypergeometric function of the second kind", that would be good :) Anthony From anthony at interlink.com.au Mon Jan 13 17:55:23 2003 From: anthony at interlink.com.au (Anthony Baxter) Date: Mon Jan 13 01:56:42 2003 Subject: [Spambayes] changing various Options settings. In-Reply-To: <200301101108.h0AB8Q304706@localhost.localdomain> Message-ID: <200301130655.h0D6tNc16780@localhost.localdomain> Before we do a proper release of this puppy, there's a few options-related changes I'd like to suggest: First off, all the stuff that's currently under 'TestDriver' that gets used by real people needs to be moved. I'm looking at ham_cutoff: 0.20 spam_cutoff: 0.90 in particular. Unfortunately, changing this will break everyone who's currently got the system deployed. Rather than doing this, I suggest we add a new section 'Categorization', and add cutoff_ham and cutoff_spam options. We can then change the code to use the new options rather than the old - it means people with the old code will get their preferences ignored until they upgrade, but the alternative is to make it break for everyone. That way, the only options most people will want to frob are either in the 'Tokenizer' block, or the new 'Categorization' block. Thoughts? -- Anthony Baxter It's never too late to have a happy childhood. From tim at fourstonesExpressions.com Mon Jan 13 07:49:15 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Mon Jan 13 08:52:22 2003 Subject: [Spambayes] changing various Options settings. Message-ID: This is really only a semantic problem, but I agree that it needs to be moved. Unfortunately, duplicating it creates another problem. I doubt that the old one will ever really go away if we do it that way. Let's keep in mind that we're doing this for release, and just do it right the first time. This is a special case of a general problem with Options.py. There's a ton of stuff that's only meaningful to the research phase of the project. Also, and this is a big one, there are multiple places to specify database names. To me this is a really big problem, particularly for OptionsConfig.py, which right now assumes that you're running the pop3proxy (an obviously invalid assumption). There should only be one place to specify a database, or if we think that people might be running more than one 'subsystem', we should have a place clearly for each one. I propose we simply have a term 'database-name' or something like that, which will be used regardless of whether your running pop3proxy or hammie or whatever else... - TimS 1/13/2003 12:55:23 AM, Anthony Baxter wrote: > >Before we do a proper release of this puppy, there's a few options- related >changes I'd like to suggest: > >First off, all the stuff that's currently under 'TestDriver' that gets >used by real people needs to be moved. I'm looking at > >ham_cutoff: 0.20 >spam_cutoff: 0.90 > >in particular. Unfortunately, changing this will break everyone who's >currently got the system deployed. Rather than doing this, I suggest >we add a new section 'Categorization', and add cutoff_ham and cutoff_spam >options. We can then change the code to use the new options rather than >the old - it means people with the old code will get their preferences >ignored until they upgrade, but the alternative is to make it break >for everyone. > >That way, the only options most people will want to frob are either in >the 'Tokenizer' block, or the new 'Categorization' block. > >Thoughts? > > >-- >Anthony Baxter >It's never too late to have a happy childhood. > > >_______________________________________________ >Spambayes mailing list >Spambayes@python.org >http://mail.python.org/mailman/listinfo/spambayes > > c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org From rob at hooft.net Mon Jan 13 15:04:33 2003 From: rob at hooft.net (Rob W.W. Hooft) Date: Mon Jan 13 09:05:24 2003 Subject: [Spambayes] Re: [Spambayes-checkins] website background.ht,1.4,1.5 style.css,1.2,1.3 References: <200301130456.h0D4uuQ15598@localhost.localdomain> Message-ID: <3E22C771.9090809@hooft.net> Anthony Baxter wrote: >>>>"Anthony Baxter" wrote >>> >>Update of /cvsroot/spambayes/website >>In directory sc8-pr-cvs1:/tmp/cvs-serv17660 >> >>Modified Files: >> background.ht style.css >>Log Message: >>updated background with some sample plots. If someone in the set of >>(Tim, Gary, Rob) could review this and point out the obvious stupids, >>that would be good. (Or anyone else who understands the math...) > > > Could someone who understands the math of this stuff please read > over the 'background' page and point out the mistakes? I have one important correction to make: AFAIK I had nothing to do with the mathematical conception of the chi-squared combining method. I have been working on chi-squared normalization of the other combining methods at the time the chi squared method was proposed by Gary and implemented by Tim. My only contribution in that realm is the change from (S-H)/(S+H) to (S-H+1)/2 to get a better "unsure" classification for messages that do not look like ham nor like spam. I'll not be of much help for mathematical foundation of anything. How about the word "definately"? I would spell it definitely. Rob -- Rob W.W. Hooft || rob@hooft.net || http://www.hooft.net/people/rob/ From carel.fellinger at chello.nl Mon Jan 13 12:13:44 2003 From: carel.fellinger at chello.nl (Carel Fellinger) Date: Mon Jan 13 11:52:09 2003 Subject: [Spambayes] re-org - making a package &c. In-Reply-To: <05d801c2ba36$ef200a90$530f8490@eden> References: <05d801c2ba36$ef200a90$530f8490@eden> Message-ID: <20030113111344.GA12027@mail.felnet> <\lurking-mode --now-that-the-work-is-almost-done> On Sun, Jan 12, 2003 at 11:34:21PM +1100, Mark Hammond wrote: ... > For various reasons, I don't think we are ready for a "binary" distribution > of this stuff yet - but making the first release "python source only" may > appeal in terms of limiting the set of initial users to a fairly literate > and sympathetic audience willing to offer valuable feedback. Especially Willing to do more then just give feedback, at least I would:) Suppose "spambayes --slash-training-styles" would run against several databases, each of those databases keeping track of the probs for a particular training style, adding extra headers indicating if and how the different databases scored this particular email. Me, I would be willing then to be carefull to train according to all training style candidates simultaniously. The advantage would be that we would be comparing all training methods on the same data. -- groetjes, carel From neale at woozle.org Mon Jan 13 10:36:07 2003 From: neale at woozle.org (Neale Pickett) Date: Mon Jan 13 13:37:35 2003 Subject: [Spambayes] changing various Options settings. In-Reply-To: (Tim Stone - Four Stones Expressions's message of "Mon, 13 Jan 2003 07:49:15 -0600") References: Message-ID: Tim Stone - Four Stones Expressions writes: > I propose we simply have a term 'database-name' or something like > that, which will be used regardless of whether your running pop3proxy > or hammie or whatever else... +1 This has been bugging me since I first wrote the dbm storage method. If we can all agree on a default database name for every platform, I'm all for standardizing it. From neale at woozle.org Mon Jan 13 10:39:19 2003 From: neale at woozle.org (Neale Pickett) Date: Mon Jan 13 13:39:22 2003 Subject: [Spambayes] re-org - making a package &c. In-Reply-To: <20030113111344.GA12027@mail.felnet> (Carel Fellinger's message of "Mon, 13 Jan 2003 12:13:44 +0100") References: <05d801c2ba36$ef200a90$530f8490@eden> <20030113111344.GA12027@mail.felnet> Message-ID: Carel Fellinger writes: > Suppose "spambayes --slash-training-styles" would run against several > databases, each of those databases keeping track of the probs for a > particular training style, adding extra headers indicating if and how > the different databases scored this particular email. Me, I would be > willing then to be carefull to train according to all training style > candidates simultaniously. That's an interesting idea. It made me wonder if it wouldn't be helpful to have some sort of application which trained on some data, then ran a few scoring runs on other data with various options set, and reported back what your "ideal" options are. Does that make sense? This could be a part of the initial training step. Neale From neale at woozle.org Mon Jan 13 10:58:06 2003 From: neale at woozle.org (Neale Pickett) Date: Mon Jan 13 13:58:10 2003 Subject: [Spambayes] re-org - making a package &c. In-Reply-To: <200301110418.h0B4I4V23348@localhost.localdomain> (Anthony Baxter's message of "Sat, 11 Jan 2003 15:18:03 +1100") References: <200301110418.h0B4I4V23348@localhost.localdomain> Message-ID: Anthony Baxter writes: > On the other hand, it's possibly better to have 3 packages - the base > "spambayes" one, the outlook plugin, and the pop3proxy/hammie > package. Not sure. I like the idea of splitting things into specific directories. Perhaps it's time rename things according to what they do and move the emphasis away from testing. Here's what I propose for hammie & co: hammie/ -> contrib/ hammie.py + hammiefilter.py -> filter.py mboxtrain.py + hammiebulk.py -> bulktrain.py hammiesrv.py -> contrib/XMLRPCServer.py hammiecli.py -> contrib/XMLRPCClient.py hammiebatch.py -> deleted (or is someone using this?) Aside from renaming things based on what they do, this would reduce hammie's littering of the top-level directory to: filter.py bulktrain.py Plus the standard supporting modules (storage.py, dbmstorage.py, tokenizer.py, etc.) I wager it could make the options file a lot simpler, too. Shall I barge ahead with this? Neale From carel.fellinger at chello.nl Mon Jan 13 20:24:19 2003 From: carel.fellinger at chello.nl (Carel Fellinger) Date: Mon Jan 13 14:36:19 2003 Subject: [Spambayes] re-org - making a package &c. In-Reply-To: References: <05d801c2ba36$ef200a90$530f8490@eden> <20030113111344.GA12027@mail.felnet> Message-ID: <20030113192419.GA17717@mail.felnet> On Mon, Jan 13, 2003 at 10:39:19AM -0800, Neale Pickett wrote: ... > That's an interesting idea. It made me wonder if it wouldn't be helpful > to have some sort of application which trained on some data, then ran a > few scoring runs on other data with various options set, and reported > back what your "ideal" options are. Does that make sense? This could > be a part of the initial training step. The problem with initialisation is that there is no data to start with, so such a "default adaptor" can't come into play until you've gathered some spam and ham. But as part of this extra fine tuning step I proposed, it sure seems interesting to try to derive good settings for the options at several moments in time and see whether they differ widely for all us early adaptors. -- groetjes, carel From neale at woozle.org Mon Jan 13 11:42:56 2003 From: neale at woozle.org (Neale Pickett) Date: Mon Jan 13 14:43:07 2003 Subject: [Spambayes] re-org - making a package &c. In-Reply-To: <20030113192419.GA17717@mail.felnet> (Carel Fellinger's message of "Mon, 13 Jan 2003 20:24:19 +0100") References: <20030113111344.GA12027@mail.felnet> <20030113192419.GA17717@mail.felnet> Message-ID: Carel Fellinger writes: > The problem with initialisation is that there is no data to start > with, so such a "default adaptor" can't come into play until you've > gathered some spam and ham. But as part of this extra fine tuning > step I proposed, it sure seems interesting to try to derive good > settings for the options at several moments in time and see whether > they differ widely for all us early adaptors. With my scheme at least, you are expected to at some point have some ham and some spam. Maybe not initially, but after a week or two you are supposed to be collecting examples of both. In any case, what you propose would work as a tuning tool, to be run whenever you want to tune your config. I would look at the existing test programs and try to figure out a way to combine them. I believe Tim wrote it so that one set of trained data can be used over and over for multiple types of scoring. That and the existing support modules should make it little more than chaining scoring methods together. You still interested in doing this, Carel? Neale From richie at entrian.com Mon Jan 13 20:42:03 2003 From: richie at entrian.com (Richie Hindle) Date: Mon Jan 13 15:42:25 2003 Subject: [Spambayes] re-org - making a package &c. In-Reply-To: References: <200301110418.h0B4I4V23348@localhost.localdomain> Message-ID: [Anthony] > If anyone wants to help, let me know... If testing counts as helping... I've tested all the pieces I use, and they're all fine on the reorg-branch. This re-organisation is a very good plan. Two questions: Should we also have a 'resources' directory, or similar? I've nearly finished splitting the HTML components out of pop3proxy.py and OptionConfig.py and into an external (viewable, editable) HTML file. At the moment I have that living with the source code, and being found via __file__. Maybe things like the HTML (and images files and whatever else) should have their own subdirectory. It could be found by __file__ (or sys.argv[0] for some future frozen version) by default, or become a configuration option if there's ever a reason for that. Second, is it sensible to check in major edits at the moment? I guess things like that should wait until the reorg-branch is merged back onto the head? What with files being moved, CVS isn't going to be much help with the merge. Of course, if it's a dead cert that the reorg-branch will be merged back (and I can't see why we wouldn't do that) then edits could just be committed to that. [Neale] > Here's what I propose for hammie & co: That looks very sensible. I'd also suggest we move pop3graph.py into utilities - it's not important enough to live at the top level. -- Richie Hindle richie@entrian.com From mhammond at skippinet.com.au Tue Jan 14 12:20:05 2003 From: mhammond at skippinet.com.au (Mark Hammond) Date: Mon Jan 13 20:20:52 2003 Subject: [Spambayes] re-org - making a package &c. In-Reply-To: <20030113111344.GA12027@mail.felnet> Message-ID: <03d501c2bb6b$12cc9f50$530f8490@eden> > Willing to do more then just give feedback, at least I would:) > > Suppose "spambayes --slash-training-styles" would run against several > databases, each of those databases keeping track of the probs for a > particular training style, adding extra headers indicating if and how > the different databases scored this particular email. Me, I would be > willing then to be carefull to train according to all training style > candidates simultaniously. > > The advantage would be that we would be comparing all training methods > on the same data. My idea was closer to the existing test harness we have. I was thinking of somehow formalizing Tim's original hapax experiments. >From my limited playing with our test harness, it seems that we simply pick random messages from our ham and spam folders, train over these messages, then score these messages against the trained data. This hasn't been as important for a few months, as the algorithm hasn't changed in that period. What if we changed this to perform a "time ordered" selection of messages? For example, off the top of my head, I can see 2 training candidates (there would be a number more, but let's start with just 2): * Do not start filtering until we have, say, 20 spam and 20 ham. Once we reach this threshold, we go into a little "initial training mode". This mode trains on the ham and spam, then scores the entire inbox. We continue until the user indicates there are no spams left in their inbox. * Start filtering immediately, but only incrementally train on either incorrect or unsure classifications. Our test harness would be designed to test multiple strategies over our standard corpa. Instead of random messages, time-ordered message would be iterated over. Results similar to the existing ones are produced, so we can compare results over vastly different mail stores. IMO, it is far more important to know the best training strategy across vastly different mail stores than to know which strategy works best on any individual's store. I am pretty sure this is similar to your idea, but I thought it worth pointing out that we possibly already have some test framework we can leverage here. Mark. From tony at lownds.com Mon Jan 13 17:35:44 2003 From: tony at lownds.com (Tony Lownds) Date: Mon Jan 13 20:43:23 2003 Subject: [Spambayes] Using Spambayes w/ Eudora Message-ID: Hi All, I just started using Spambayes with Eudora. It works fine - fantastically in fact - for one POP account, but some limitations of Eudora are making using two POP accounts very problematic. As far as I can tell, Eudora can have multiple POP accounts with different POP servers, but the port cannot be changed using normal means. Even through extraordinary means (installing an "Esoteric Settings" plugin), the port number is only changeable at a global level, not per-POP account. Since Spambayes listens on a different port for each proxied server, I am limited to one spam-free account right now. Has anyone had luck using Eudora with multiple POP accounts going through pop3proxy? (Using Eudora 5.2 on Mac OS X 10.2.3 w/python 2.2) Thanks, -Tony From tim at fourstonesExpressions.com Mon Jan 13 19:46:40 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Mon Jan 13 20:47:16 2003 Subject: [Spambayes] Using Spambayes w/ Eudora In-Reply-To: Message-ID: Woah... that's a serious problem. Richie and I will have to give that one some thought... We'll get back to ya on that! - TimS 1/13/2003 7:35:44 PM, Tony Lownds wrote: >Hi All, > >I just started using Spambayes with Eudora. It works fine - >fantastically in fact - for one POP account, but some limitations of >Eudora are making using two POP accounts very problematic. > >As far as I can tell, Eudora can have multiple POP accounts with >different POP servers, but the port cannot be changed using normal >means. Even through extraordinary means (installing an "Esoteric >Settings" plugin), the port number is only changeable at a global >level, not per-POP account. > >Since Spambayes listens on a different port for each proxied server, >I am limited to one spam-free account right now. > >Has anyone had luck using Eudora with multiple POP accounts going >through pop3proxy? > >(Using Eudora 5.2 on Mac OS X 10.2.3 w/python 2.2) > >Thanks, >-Tony > >_______________________________________________ >Spambayes mailing list >Spambayes@python.org >http://mail.python.org/mailman/listinfo/spambayes > > c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org From tim.one at comcast.net Mon Jan 13 20:49:29 2003 From: tim.one at comcast.net (Tim Peters) Date: Mon Jan 13 20:50:04 2003 Subject: [Spambayes] changing various Options settings. In-Reply-To: <200301130655.h0D6tNc16780@localhost.localdomain> Message-ID: [Anthony Baxter] > Before we do a proper release of this puppy, there's a few options- > related changes I'd like to suggest: > > First off, all the stuff that's currently under 'TestDriver' that gets > used by real people needs to be moved. +1. > I'm looking at > > ham_cutoff: 0.20 > spam_cutoff: 0.90 > > in particular. Unfortunately, changing this will break everyone who's > currently got the system deployed. Rather than doing this, I suggest > we add a new section 'Categorization', and add cutoff_ham and cutoff_spam > options. We can then change the code to use the new options rather than > the old - it means people with the old code will get their preferences > ignored until they upgrade, but the alternative is to make it break > for everyone. -1. Just move it and post an announcement here. Nothing will break until somebody synchs up with CVS, and if they're pulling pre-alpha code out of CVS, they better be reading this list. Recovery is easy. From rbyrnes at ozemail.com.au Tue Jan 14 12:56:18 2003 From: rbyrnes at ozemail.com.au (Rob B) Date: Mon Jan 13 20:56:45 2003 Subject: [Spambayes] Using Spambayes w/ Eudora In-Reply-To: Message-ID: <5.1.1.6.2.20030114125159.01d6b270@127.0.0.1> At 12:35 14/01/2003, Tony Lownds sent this up the stick: >As far as I can tell, Eudora can have multiple POP accounts with different >POP servers, but the port cannot be changed using normal means. Even >through extraordinary means (installing an "Esoteric Settings" plugin), >the port number is only changeable at a global level, not per-POP account. > >Since Spambayes listens on a different port for each proxied server, I am >limited to one spam-free account right now. > >Has anyone had luck using Eudora with multiple POP accounts going through >pop3proxy? Sure have >(Using Eudora 5.2 on Mac OS X 10.2.3 w/python 2.2) Dunno about Eudora on a Mac ... but on peecee if you open up the Personalities pane, you should be able to edit each account individually. If you go through the Tools menu, then this is a global change. cheers, Rob (Eudora 5.1 on Win NT 4.0 - python 2.2) -- A little madness now and then is relished by the wisest men. This is random quote 162 of a collection of 1273 Distance from the centre of the brewing universe: [15200.8 km (8207.8 mi), 262.8 deg](Apparent) Rennerian Public Key fingerprint = 6219 33BD A37B 368D 29F5 19FB 945D C4D7 1F66 D9C5 From anthony at interlink.com.au Tue Jan 14 13:16:39 2003 From: anthony at interlink.com.au (Anthony Baxter) Date: Mon Jan 13 21:17:57 2003 Subject: [Spambayes] re-org - making a package &c. In-Reply-To: Message-ID: <200301140216.h0E2Gdt25884@localhost.localdomain> >>> Neale Pickett wrote > Perhaps it's time rename things according to what they do and move the > emphasis away from testing. Here's what I propose for hammie & co: Um, don't do this! It will conflict with what I've done on the reorg-branch, already. Check it out and examine what's been moved there... > Shall I barge ahead with this? Nooooo -- Anthony Baxter It's never too late to have a happy childhood. From anthony at interlink.com.au Tue Jan 14 13:22:02 2003 From: anthony at interlink.com.au (Anthony Baxter) Date: Mon Jan 13 21:23:11 2003 Subject: [Spambayes] Using Spambayes w/ Eudora In-Reply-To: Message-ID: <200301140222.h0E2M2P26012@localhost.localdomain> >>> Tony Lownds wrote > As far as I can tell, Eudora can have multiple POP accounts with > different POP servers, but the port cannot be changed using normal > means. Even through extraordinary means (installing an "Esoteric > Settings" plugin), the port number is only changeable at a global > level, not per-POP account. What about using multiple virtual loopback interfaces? 127.0.0.1, 127.0.0.2, &c, and making pop3proxy use getpeername() to look up what address it is you've called? -- Anthony Baxter It's never too late to have a happy childhood. From skip at pobox.com Mon Jan 13 20:27:55 2003 From: skip at pobox.com (Skip Montanaro) Date: Mon Jan 13 21:28:07 2003 Subject: [Spambayes] Using Spambayes w/ Eudora In-Reply-To: References: Message-ID: <15907.30123.985476.218448@montanaro.dyndns.org> >> Has anyone had luck using Eudora with multiple POP accounts going >> through pop3proxy? Tim> Woah... that's a serious problem. Richie and I will have to give Tim> that one some thought... We'll get back to ya on that! Is there any reason that in principle pop3proxy can't multiplex the content it receives from several different servers on a single output port? Skip From anthony at interlink.com.au Tue Jan 14 13:55:55 2003 From: anthony at interlink.com.au (Anthony Baxter) Date: Mon Jan 13 21:57:06 2003 Subject: [Spambayes] Using Spambayes w/ Eudora In-Reply-To: <15907.30123.985476.218448@montanaro.dyndns.org> Message-ID: <200301140255.h0E2ttG26338@localhost.localdomain> > Is there any reason that in principle pop3proxy can't multiplex the content > it receives from several different servers on a single output port? The problem is knowing, when it gets a connection from the mail client, which server the mail client wishes to talk to. -- Anthony Baxter It's never too late to have a happy childhood. From anthony at interlink.com.au Tue Jan 14 14:00:36 2003 From: anthony at interlink.com.au (Anthony Baxter) Date: Mon Jan 13 22:01:45 2003 Subject: [Spambayes] re-org - making a package &c. In-Reply-To: Message-ID: <200301140300.h0E30aJ26394@localhost.localdomain> >>> Richie Hindle wrote > If testing counts as helping... I've tested all the pieces I use, and > they're all fine on the reorg-branch. This re-organisation is a very good > plan. Well, I think I'm about done with it, so I'll be merging back into the trunk shortly. You _will_ need to do a cvs up -dP to get the new directories. > Should we also have a 'resources' directory, or similar? I've nearly > finished splitting the HTML components out of pop3proxy.py and > OptionConfig.py and into an external (viewable, editable) HTML file. At > the moment I have that living with the source code, and being found via > __file__. Maybe things like the HTML (and images files and whatever else) > should have their own subdirectory. It could be found by __file__ (or > sys.argv[0] for some future frozen version) by default, or become a > configuration option if there's ever a reason for that. There's a few problems with that - the first, as you pointed out, is finding the damn files. The second is getting distutils to do the right thing with them. What we ended up doing with roundup was to bundle all of the resources up into a separate python module, and get it with 'import'. > Second, is it sensible to check in major edits at the moment? I guess > things like that should wait until the reorg-branch is merged back onto the > head? What with files being moved, CVS isn't going to be much help with > the merge. Of course, if it's a dead cert that the reorg-branch will be > merged back (and I can't see why we wouldn't do that) then edits could just > be committed to that. Wait til I merge the branch. Will be later this afternoon. > That looks very sensible. I'd also suggest we move pop3graph.py into > utilities - it's not important enough to live at the top level. Ok - I'll do that before the merge. -- Anthony Baxter It's never too late to have a happy childhood. From tony at lownds.com Mon Jan 13 18:45:47 2003 From: tony at lownds.com (Tony Lownds) Date: Mon Jan 13 22:13:07 2003 Subject: [Spambayes] Using Spambayes w/ Eudora In-Reply-To: <200301140222.h0E2M2P26012@localhost.localdomain> References: <200301140222.h0E2M2P26012@localhost.localdomain> Message-ID: At 1:22 PM +1100 1/14/03, Anthony Baxter wrote: >What about using multiple virtual loopback interfaces? 127.0.0.1, >127.0.0.2, &c, and making pop3proxy use getpeername() to look up >what address it is you've called? I will try this. Adding another loopback interface has to be done from the command line: sudo ifconfig lo0 inet 127.0.0.2 add Because its not done through the OS' configuration GUI, I'm not sure the settings will be saved after a restart. FYI, adding a another interface for an ethernet port IS easily done through the GUI. I'll see if my loopback address stays around after a restart. -Tony From tony at lownds.com Mon Jan 13 19:17:40 2003 From: tony at lownds.com (Tony Lownds) Date: Mon Jan 13 22:31:00 2003 Subject: [Spambayes] Using Spambayes w/ Eudora In-Reply-To: <5.1.1.6.2.20030114125159.01d6b270@127.0.0.1> References: <5.1.1.6.2.20030114125159.01d6b270@127.0.0.1> Message-ID: At 12:56 PM +1100 1/14/03, Rob B wrote: >>Has anyone had luck using Eudora with multiple POP accounts going >>through pop3proxy? > >Sure have It turns out that Rob's accounts were on the same server, so he was lucky enough to avoid this tar pit. -Tony From tim at fourstonesExpressions.com Mon Jan 13 21:47:50 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Mon Jan 13 22:48:27 2003 Subject: [Spambayes] Using Spambayes w/ Eudora Message-ID: 1/13/2003 8:55:55 PM, Anthony Baxter wrote: > >> Is there any reason that in principle pop3proxy can't multiplex the content >> it receives from several different servers on a single output port? > >The problem is knowing, when it gets a connection from the mail client, >which server the mail client wishes to talk to. Correct. This is the problem. Nothing in the pop3 conversation gives any indication as to what server is on the other end of the line... - Tim S > >-- >Anthony Baxter >It's never too late to have a happy childhood. > > >_______________________________________________ >Spambayes mailing list >Spambayes@python.org >http://mail.python.org/mailman/listinfo/spambayes > > c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org From neale at woozle.org Mon Jan 13 20:19:22 2003 From: neale at woozle.org (Neale Pickett) Date: Mon Jan 13 23:19:33 2003 Subject: [Spambayes] Using Spambayes w/ Eudora In-Reply-To: <200301140255.h0E2ttG26338@localhost.localdomain> (Anthony Baxter's message of "Tue, 14 Jan 2003 13:55:55 +1100") References: <200301140255.h0E2ttG26338@localhost.localdomain> Message-ID: Anthony Baxter writes: >> Is there any reason that in principle pop3proxy can't multiplex the content >> it receives from several different servers on a single output port? > > The problem is knowing, when it gets a connection from the mail client, > which server the mail client wishes to talk to. Wait, why can't you just log in with a username of username@hostname ? Or username:hostname or username@@hostname or whatever. The point is, you'd send the name of the POP server you're trying to contact as part of the username. We used to do this to vhost pop accounts back at the big dot-bomb ISP where I worked. AFAIK, it worked great. Neale From neale at woozle.org Mon Jan 13 20:23:17 2003 From: neale at woozle.org (Neale Pickett) Date: Mon Jan 13 23:23:21 2003 Subject: [Spambayes] re-org - making a package &c. In-Reply-To: <200301140216.h0E2Gdt25884@localhost.localdomain> (Anthony Baxter's message of "Tue, 14 Jan 2003 13:16:39 +1100") References: <200301140216.h0E2Gdt25884@localhost.localdomain> Message-ID: Anthony Baxter writes: > Um, don't do this! It will conflict with what I've done on the > reorg-branch, already. Check it out and examine what's been moved > there... Oh my! That's what I get for only half paying attention. Well then, I retract my proposal entirely. Good work in the reorg, I like it. Although I still think there should be a "contrib" or somesuch directory to tuck away things like hammiesrv and hammiecli, which are mostly only of academic use. (Well, there's one guy on here using hammiesrv, but he seemed nice enough not to mind being labelled "academic" :) >> Shall I barge ahead with this? > > Nooooo Right. Thanks for stopping me, Anthony. I can be a little bullheaded sometime, so it's good for people to put down fenceposts for me to bonk my head against now and again ;) Neale From tim at fourstonesExpressions.com Mon Jan 13 22:24:11 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Mon Jan 13 23:24:46 2003 Subject: [Spambayes] Using Spambayes w/ Eudora In-Reply-To: Message-ID: <1UB6USFEAPLRC0C0ROTN6ZIDGFMKXU.3e2390eb@myst> 1/13/2003 10:19:22 PM, Neale Pickett wrote: >Anthony Baxter writes: > >>> Is there any reason that in principle pop3proxy can't multiplex the content >>> it receives from several different servers on a single output port? >> >> The problem is knowing, when it gets a connection from the mail client, >> which server the mail client wishes to talk to. > >Wait, why can't you just log in with a username of > > username@hostname > >? > >Or username:hostname or username@@hostname or whatever. The point is, >you'd send the name of the POP server you're trying to contact as part >of the username. We used to do this to vhost pop accounts back at the >big dot-bomb ISP where I worked. AFAIK, it worked great. There ya go... it's a hack, but relatively elegant. pop3proxy would have to be altered to recognize the pattern, but that shouldn't be too difficult. I was thinking of a scheme where the proxy would recognize that multiple servers were being proxied on the same port, and do the LIST and RETR stuff on both, and send the stuff back on that single port, where a filter could be set up to route the incoming mail to the correct inbox based on the headers. Your idea might be a bit easier than that... - TimS > >Neale > > c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org From skip at pobox.com Mon Jan 13 22:51:14 2003 From: skip at pobox.com (Skip Montanaro) Date: Tue Jan 14 00:05:48 2003 Subject: [Spambayes] Using Spambayes w/ Eudora In-Reply-To: <200301140255.h0E2ttG26338@localhost.localdomain> References: <15907.30123.985476.218448@montanaro.dyndns.org> <200301140255.h0E2ttG26338@localhost.localdomain> Message-ID: <15907.38722.282150.490057@montanaro.dyndns.org> >> Is there any reason that in principle pop3proxy can't multiplex the >> content it receives from several different servers on a single output >> port? Anthony> The problem is knowing, when it gets a connection from the mail Anthony> client, which server the mail client wishes to talk to. All of them? S From tim at fourstonesExpressions.com Mon Jan 13 23:09:48 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Tue Jan 14 00:10:24 2003 Subject: [Spambayes] Using Spambayes w/ Eudora In-Reply-To: <15907.38722.282150.490057@montanaro.dyndns.org> Message-ID: 1/13/2003 10:51:14 PM, Skip Montanaro wrote: > > >> Is there any reason that in principle pop3proxy can't multiplex the > >> content it receives from several different servers on a single output > >> port? > > Anthony> The problem is knowing, when it gets a connection from the mail > Anthony> client, which server the mail client wishes to talk to. > >All of them? Not at all... the mail client will query specific servers on specific schedules or upon request by the user. I have three accounts configured on mine (not Eudora). One one of them, I have the client automatically check for new mail every minute, on another it checks every five minutes, and one I only check occasionally and that by manual request only (I push the check button every now and then). The proxy would normally not query all the accounts and send mail back from all of them, because the client is only expecting mail from one of them... - TimS > >S > > > c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org From anthony at interlink.com.au Tue Jan 14 16:31:16 2003 From: anthony at interlink.com.au (Anthony Baxter) Date: Tue Jan 14 00:32:37 2003 Subject: [Spambayes] Using Spambayes w/ Eudora In-Reply-To: Message-ID: <200301140531.h0E5VGj17824@localhost.localdomain> >>> Neale Pickett wrote > Wait, why can't you just log in with a username of > > username@hostname Last time I looked, Eudora ate everything to the right of an @ sign as the server name. And the field was limited to something like 14 characters. It works fine with real mailers just not eudora. -- Anthony Baxter It's never too late to have a happy childhood. From neale at woozle.org Mon Jan 13 21:36:29 2003 From: neale at woozle.org (Neale Pickett) Date: Tue Jan 14 00:36:39 2003 Subject: [Spambayes] Using Spambayes w/ Eudora In-Reply-To: <200301140531.h0E5VGj17824@localhost.localdomain> (Anthony Baxter's message of "Tue, 14 Jan 2003 16:31:16 +1100") References: <200301140531.h0E5VGj17824@localhost.localdomain> Message-ID: Anthony Baxter writes: > Last time I looked, Eudora ate everything to the right of an @ sign > as the server name. And the field was limited to something like 14 > characters. Oh, man, that sucks! Okay then, I guess the only thing for it is to have a map in python from usernames to username/host combinations. {'user1': ('pop3.bigmailhost.net', 'neale'), 'user2': ('pop3.mediumhost.net', 'npickett')} etc. Would that would be a workable fallback for the @-impaired? Neale From anthony at interlink.com.au Tue Jan 14 16:40:39 2003 From: anthony at interlink.com.au (Anthony Baxter) Date: Tue Jan 14 00:41:51 2003 Subject: [Spambayes] merge is done. cvs up -dP time. Message-ID: <200301140540.h0E5eds17982@localhost.localdomain> The reorg-branch has been merged into the trunk. If you're running from CVS, you will need to do a cvs up -dP to get a working version. There might be a cvs commit message, but it's probable that mailman will bitch that it's too large and won't let it straight through. I'm about to make the options change - this _will_ break your existing customised options files. -- Anthony Baxter It's never too late to have a happy childhood. From tony-bayes at lownds.com Mon Jan 13 22:49:04 2003 From: tony-bayes at lownds.com (Tony Lownds) Date: Tue Jan 14 01:49:03 2003 Subject: [Spambayes] Using Spambayes w/ Eudora Message-ID: At 1:22 PM +1100 1/14/03, Anthony Baxter wrote: >What about using multiple virtual loopback interfaces? 127.0.0.1, >127.0.0.2, &c, and making pop3proxy use getpeername() to look up >what address it is you've called? This approach is working great, and no getpeername() call is needed. I have a startup script that sets everything up, I even have the actual POP traffic tunneled over ssh. Here is my startup script: ------- SpamBayes.command --------- #!/bin/sh clear ulimit -s 2048 cd ~/spambayes sudo ifconfig lo0 inet 127.0.0.2 add ssh -N -L 1110:127.0.0.1:110 tony@server1.com & ssh -N -L 1111:127.0.0.1:110 tony@server2.com & sudo python pop3proxy.py And the relevant lines from bayescustomize.ini: pop3proxy_ports = 127.0.0.1:110, 127.0.0.2:110 pop3proxy_servers = localhost:1110, localhost:1111 The diff to pop3proxy.py is attached. -Tony-------------- next part -------------- A non-text attachment was scrubbed... Name: bind_address.patch Type: application/mac-binhex40 Size: 7188 bytes Desc: not available Url : http://mail.python.org/pipermail/spambayes/attachments/20030113/b1ea0340/bind_address.bin From anthony at interlink.com.au Tue Jan 14 17:54:53 2003 From: anthony at interlink.com.au (Anthony Baxter) Date: Tue Jan 14 01:56:10 2003 Subject: [Spambayes] Using Spambayes w/ Eudora In-Reply-To: Message-ID: <200301140654.h0E6srh18800@localhost.localdomain> >>> Tony Lownds wrote > The diff to pop3proxy.py is attached. Erm. The patch came through as a "application/mac-binhex40". Could you re-send with a more... standard... format? Ta -- Anthony Baxter It's never too late to have a happy childhood. From anthony at interlink.com.au Tue Jan 14 18:41:03 2003 From: anthony at interlink.com.au (Anthony Baxter) Date: Tue Jan 14 02:42:15 2003 Subject: [Spambayes] what else is needed for a first (source) release? Message-ID: <200301140741.h0E7f3R19057@localhost.localdomain> Ok, can people nominate things that they think would be good before a first release? I'd like to try and get one out before the spam conference (it's as good a date as any :) I figure we make the first release a source only one - which isn't a biggie for most people, since it's in python. Should the release just have what's in the current setup.py (which doesn't include the Outlook2000 directory), with a separate release for the O2K plugin+core? Or should it all be in one big bundle? I'm fiddling around with the documentation on the website - trying to explain how it all works in terms that my partner can understand (my usual approach to making sure non-technical explanations are clear enough). Anthony From vanhorn at whidbey.com Mon Jan 13 23:45:06 2003 From: vanhorn at whidbey.com (G. Armour Van Horn) Date: Tue Jan 14 02:45:10 2003 Subject: [Spambayes] Using Spambayes w/ Eudora References: <200301140531.h0E5VGj17824@localhost.localdomain> Message-ID: <3E23C002.DE810CEF@whidbey.com> Anthony Baxter wrote: > >>> Neale Pickett wrote > > Wait, why can't you just log in with a username of > > > > username@hostname > > Last time I looked, Eudora ate everything to the right of an @ sign > as the server name. And the field was limited to something like 14 > characters. I haven't run Eudora through a proxy for a long time, but I don't think that character limit you site has been in place recently. Eudora had no problem picking up mail for vanhorn@coldwellbankerwhidbey.com or vanhorn@verbose.twistedhistory.com, I'm pretty sure those are well over 14 chars. Van (vanhorn@more.domains.than.you.can.shake.a.stick.at.org) -- ---------------------------------------------------------- Sign up now for Quotes of the Day, a handful of quotations on a theme delivered every morning. Enlightenment! Daily, for free! mailto:twisted@whidbey.com?subject=Subscribe_QOTD For web hosting and maintenance, visit Van's home page: http://www.domainvanhorn.com/van/ ---------------------------------------------------------- From francois.granger at free.fr Tue Jan 14 11:19:39 2003 From: francois.granger at free.fr (Fran=?ISO-8859-1?B?5w==?=ois Granger) Date: Tue Jan 14 05:23:50 2003 Subject: [Spambayes] Using Spambayes w/ Eudora In-Reply-To: <200301140654.h0E6srh18800@localhost.localdomain> Message-ID: on 14/01/03 7:54, Anthony Baxter at anthony@interlink.com.au wrote: > >>>> Tony Lownds wrote >> The diff to pop3proxy.py is attached. > > Erm. The patch came through as a "application/mac-binhex40". Could > you re-send with a more... standard... format? Open it with a text editor. It is pure texte with unix EOL -- Le courrier est un moyen de communication. Les gens devraient se poser des questions sur les implications politiques des choix (ou non choix) de leurs outils et technologies. Pour des courriers propres : -- From just at letterror.com Tue Jan 14 11:34:12 2003 From: just at letterror.com (Just van Rossum) Date: Tue Jan 14 05:34:19 2003 Subject: [Spambayes] Using Spambayes w/ Eudora In-Reply-To: Message-ID: Fran?ois Granger wrote: > > Erm. The patch came through as a "application/mac-binhex40". Could > > you re-send with a more... standard... format? > > Open it with a text editor. It is pure texte with unix EOL No it's not, it's encoded as binhex: --============_-1169595545==_============ Content-Type: application/mac-binhex40; Name="bind_address.patch" Content-Transfer-Encoding: 7bit Content-Disposition: attachment (This file must be converted with BinHex 4.0) :%Q*TEQ4IB@4NFQ9cFbj`BA4MD!!!!!!!!!!!!!!!!!!8E!!!!!$$`5SU+L"`Eh! cF(*[H(NZF(N*6@pZ)%TKEL!a-b!a0$Se16Sb-L!b-$!c#LdY,5"`Eh!cF(*[H(P etc. I've attached a decoded version, reencoded as base64. Just-------------- next part -------------- z'??mj?Zr?????+???t??y??u?]??,??\ Message-ID: on 14/01/03 11:34, Just van Rossum at just@letterror.com wrote: > Fran?ois Granger wrote: > >>> Erm. The patch came through as a "application/mac-binhex40". Could >>> you re-send with a more... standard... format? >> >> Open it with a text editor. It is pure texte with unix EOL > > No it's not, it's encoded as binhex: Apology, it was automatically decoded at my end. -- Le courrier est un moyen de communication. Les gens devraient se poser des questions sur les implications politiques des choix (ou non choix) de leurs outils et technologies. Pour des courriers propres : -- From tony at lownds.com Mon Jan 13 23:06:10 2003 From: tony at lownds.com (Tony Lownds) Date: Tue Jan 14 08:54:27 2003 Subject: [Spambayes] Using Spambayes w/ Eudora In-Reply-To: <200301140654.h0E6srh18800@localhost.localdomain> References: <200301140654.h0E6srh18800@localhost.localdomain> Message-ID: At 5:54 PM +1100 1/14/03, Anthony Baxter wrote: > >>> Tony Lownds wrote >> The diff to pop3proxy.py is attached. > >Erm. The patch came through as a "application/mac-binhex40". Could >you re-send with a more... standard... format? > >Ta > You mean there's something more standard than mac-binhex40? :) *** pop3proxy.py Mon Jan 13 14:59:22 2003 --- pop3proxy_peer.py Mon Jan 13 21:56:27 2003 *************** *** 157,164 **** dispatchers created by a factory callable. """ ! def __init__(self, port, factory, factoryArgs=(), ! socketMap=asyncore.socket_map): asyncore.dispatcher.__init__(self, map=socketMap) self.socketMap = socketMap self.factory = factory --- 157,164 ---- dispatchers created by a factory callable. """ ! def __init__(self, port, factory, factoryArgs=(), listenAddress='', ! socketMap=asyncore.socket_map, ): asyncore.dispatcher.__init__(self, map=socketMap) self.socketMap = socketMap self.factory = factory *************** *** 168,175 **** self.set_socket(s, socketMap) self.set_reuse_addr() if options.verbose: ! print "%s listening on port %d." % (self.__class__.__name__, port) ! self.bind(('', port)) self.listen(5) def handle_accept(self): --- 168,175 ---- self.set_socket(s, socketMap) self.set_reuse_addr() if options.verbose: ! print "%s listening on %s port %d." % (self.__class__.__name__, listenAddress, port) ! self.bind((listenAddress, port)) self.listen(5) def handle_accept(self): *************** *** 390,397 **** def __init__(self, serverName, serverPort, proxyPort): proxyArgs = (serverName, serverPort) ! Listener.__init__(self, proxyPort, BayesProxy, proxyArgs) ! print 'Listener on port %d is proxying %s:%d' % (proxyPort, serverName, serverPort) class BayesProxy(POP3ProxyBase): --- 390,403 ---- def __init__(self, serverName, serverPort, proxyPort): proxyArgs = (serverName, serverPort) ! bindAddress, bindPort = proxyPort ! Listener.__init__( ! self, bindPort, BayesProxy, proxyArgs, ! listenAddress=bindAddress ! ) ! print 'Listener on port %s is proxying %s:%d' % ( ! _addressPortStr(proxyPort), serverName, serverPort ! ) class BayesProxy(POP3ProxyBase): *************** *** 1251,1257 **** if options.pop3proxy_ports: splitPorts = options.pop3proxy_ports.split(',') ! self.proxyPorts = map(int, map(string.strip, splitPorts)) if len(self.servers) != len(self.proxyPorts): print "pop3proxy_servers & pop3proxy_ports are different lengths!" --- 1257,1263 ---- if options.pop3proxy_ports: splitPorts = options.pop3proxy_ports.split(',') ! self.proxyPorts = map(_addressAndOrPort, map(string.strip, splitPorts)) if len(self.servers) != len(self.proxyPorts): print "pop3proxy_servers & pop3proxy_ports are different lengths!" *************** *** 1286,1292 **** versions of the details, for display in the Status panel.""" serverStrings = ["%s:%s" % (s, p) for s, p in self.servers] self.serversString = ', '.join(serverStrings) ! self.proxyPortsString = ', '.join(map(str, self.proxyPorts)) def createWorkers(self): """Using the options that were initialised in __init__ and then --- 1292,1298 ---- versions of the details, for display in the Status panel.""" serverStrings = ["%s:%s" % (s, p) for s, p in self.servers] self.serversString = ', '.join(serverStrings) ! self.proxyPortsString = ', '.join(map(_addressPortStr, self.proxyPorts)) def createWorkers(self): """Using the options that were initialised in __init__ and then *************** *** 1333,1340 **** --- 1339,1364 ---- self.spamCorpus.addObserver(self.spamTrainer) self.hamCorpus.addObserver(self.hamTrainer) + # helper functions + + def _addressAndOrPort(s): + if ':' in s: + addr, port = s.split(':') + return addr, int(port) + else: + return '', int(s) + + def _addressPortStr((addr, port)): + if not addr: + return str(port) + else: + return '%s:%d' % (addr, port) + + # globals + state = State() + # main program def main(servers, proxyPorts, uiPort, launchUI): """Runs the proxy forever or until a 'KILL' command is received or *************** *** 1573,1579 **** pop3Server.sendall("kill\r\n") pop3Server.recv(100) - # =================================================================== # __main__ driver. # =================================================================== --- 1597,1602 ---- *************** *** 1601,1607 **** elif opt == '-p': state.databaseFilename = arg elif opt == '-l': ! state.proxyPorts = [int(arg)] elif opt == '-u': state.uiPort = int(arg) elif opt == '-z': --- 1624,1630 ---- elif opt == '-p': state.databaseFilename = arg elif opt == '-l': ! state.proxyPorts = [_addressAndOrPort(arg)] elif opt == '-u': state.uiPort = int(arg) elif opt == '-z': From barry at python.org Tue Jan 14 09:10:18 2003 From: barry at python.org (Barry A. Warsaw) Date: Tue Jan 14 09:10:50 2003 Subject: [Spambayes] what else is needed for a first (source) release? References: <200301140741.h0E7f3R19057@localhost.localdomain> Message-ID: <15908.6730.504794.353666@gargle.gargle.HOWL> >>>>> "AB" == Anthony Baxter writes: AB> Ok, can people nominate things that they think would be good AB> before a first release? I'd like to try and get one out before AB> the spam conference (it's as good a date as any :) Although it might not be ready until my train pulls into South Station, I'm working on a Mailman handler module for integration with Spambayes. The actual hook is pretty easy (using the hammie.py interface) -- it's all the niddling little stuff like u/i, moderation, training, configuration, etc. that's a bit rough around the edges. Probably won't be ready for the 1.0 release, but it might make a good patch for a follow on. (I'm trying to decide what to actually do with it -- check it into a branch of Mailman, release it as a patch, etc...). Having a spambayes package I can unpack in Mailman's pythonlib dir is perfect. -Barry From francois.granger at free.fr Tue Jan 14 16:05:37 2003 From: francois.granger at free.fr (Fran=?ISO-8859-1?B?5w==?=ois Granger) Date: Tue Jan 14 10:09:19 2003 Subject: [Spambayes] Using Spambayes w/ Eudora In-Reply-To: Message-ID: on 14/01/03 6:36, Neale Pickett at neale@woozle.org wrote: > Anthony Baxter writes: > >> Last time I looked, Eudora ate everything to the right of an @ sign >> as the server name. And the field was limited to something like 14 >> characters. > > Oh, man, that sucks! > > Okay then, I guess the only thing for it is to have a map in python from > usernames to username/host combinations. > > {'user1': ('pop3.bigmailhost.net', 'neale'), > 'user2': ('pop3.mediumhost.net', 'npickett')} I would have an issue with this since I try to have the same login name on various servers... francois.granger@free.fr francois.granger@laposte.net A scheme where i would use alogin of francois.granger:free and francois.granger:laposte would make it. Then pop3proxy just need to split on ":" and match the remaining "free" to "pop.free.fr". -- Le courrier est un moyen de communication. Les gens devraient se poser des questions sur les implications politiques des choix (ou non choix) de leurs outils et technologies. Pour des courriers propres : -- From tim at fourstonesExpressions.com Tue Jan 14 09:12:26 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Tue Jan 14 10:13:02 2003 Subject: [Spambayes] Using Spambayes w/ Eudora Message-ID: 1/14/2003 9:05:37 AM, François Granger wrote: >on 14/01/03 6:36, Neale Pickett at neale@woozle.org wrote: > >> Anthony Baxter writes: >> >>> Last time I looked, Eudora ate everything to the right of an @ sign >>> as the server name. And the field was limited to something like 14 >>> characters. >> >> Oh, man, that sucks! >> >> Okay then, I guess the only thing for it is to have a map in python from >> usernames to username/host combinations. >> >> {'user1': ('pop3.bigmailhost.net', 'neale'), >> 'user2': ('pop3.mediumhost.net', 'npickett')} > >I would have an issue with this since I try to have the same login name on >various servers... > francois.granger@free.fr > francois.granger@laposte.net > >A scheme where i would use alogin of francois.granger:free and >francois.granger:laposte would make it. Then pop3proxy just need to split on >":" and match the remaining "free" to "pop.free.fr". This would only work for mail servers that conform to the 'standard' naming convention. I have one mail server that is 'incoming.verizon.net' We would need to do :... this all gets ugly, and I'm wondering how successfully the 'average user' could set all this up... - TimS > > >-- >Le courrier est un moyen de communication. Les gens devraient >se poser des questions sur les implications politiques des choix (ou non >choix) de leurs outils et technologies. Pour des courriers propres : > -- > > >_______________________________________________ >Spambayes mailing list >Spambayes@python.org >http://mail.python.org/mailman/listinfo/spambayes > > c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org From skip at pobox.com Tue Jan 14 09:21:09 2003 From: skip at pobox.com (Skip Montanaro) Date: Tue Jan 14 10:21:13 2003 Subject: [Spambayes] Using Spambayes w/ Eudora In-Reply-To: References: Message-ID: <15908.10981.796169.470346@montanaro.dyndns.org> >>> Is there any reason that in principle pop3proxy can't multiplex the >>> content it receives from several different servers on a single >>> output port? >> >> The problem is knowing, when it gets a connection from the mail >> client, which server the mail client wishes to talk to. Tim> Correct. This is the problem. Nothing in the pop3 conversation Tim> gives any indication as to what server is on the other end of the Tim> line... I'm still unclear what the problem is. I use fetchmail to grab mail from two POP servers at the moment. Everything funnels into procmail which runs hammie then distributes each message to one of several different accounts. I don't care what the source of the message is. What's the big deal? A source of mail is a source of mail. Skip From skip at pobox.com Tue Jan 14 09:25:43 2003 From: skip at pobox.com (Skip Montanaro) Date: Tue Jan 14 10:25:45 2003 Subject: [Spambayes] Using Spambayes w/ Eudora In-Reply-To: References: <15907.38722.282150.490057@montanaro.dyndns.org> Message-ID: <15908.11255.27102.87951@montanaro.dyndns.org> Anthony> The problem is knowing, when it gets a connection from the mail Anthony> client, which server the mail client wishes to talk to. >> All of them? Tim> Not at all... the mail client will query specific servers on Tim> specific schedules or upon request by the user. Oh, okay, I get it. I actually have two fetchmail schedules one that runs every five minutes and one that runs every 30 minutes. I forget that Windows users have to do all that fiddling from within whatever mail client they run. (They actually use Eudora on Windows here at Northwestern as the "supported" email software. I haven't touched it, preferring instead to just bring my trusty Powerbook to work with me and continue using my fetchmail/hammie/XEmacs/VM combination.) Skip From jeremy at alum.mit.edu Tue Jan 14 10:15:11 2003 From: jeremy at alum.mit.edu (Jeremy Hylton) Date: Tue Jan 14 10:28:10 2003 Subject: [Spambayes] Using Spambayes w/ Eudora In-Reply-To: References: <200301140255.h0E2ttG26338@localhost.localdomain> Message-ID: <15908.10623.343421.523340@slothrop.zope.com> >>>>> "NP" == Neale Pickett writes: NP> Wait, why can't you just log in with a username of NP> username@hostname NP> ? The last time I suggested this for pop3proxy, someone mentioned that several clients issue commands before the login such that the proxy wouldn't be able to guess what server it was for. But it sounds like we now have a report of a client that is difficult to configure without adding the username to the servername. I think it should be an option to do either. It's certainly more attractive for configuration: The user never needs to configure anything in the proxy beyond the port; all the per-server configuration can be done using the client's configuration system. NP> Or username:hostname or username@@hostname or whatever. The NP> point is, you'd send the name of the POP server you're trying to NP> contact as part of the username. We used to do this to vhost NP> pop accounts back at the big dot-bomb ISP where I worked. NP> AFAIK, it worked great. There's code in pspam/pop.py that does this. It's not difficult. Jeremy From francois.granger at free.fr Tue Jan 14 16:43:46 2003 From: francois.granger at free.fr (Fran=?ISO-8859-1?B?5w==?=ois Granger) Date: Tue Jan 14 10:43:56 2003 Subject: [Spambayes] Using Spambayes w/ Eudora In-Reply-To: Message-ID: on 14/01/03 16:12, Tim Stone - Four Stones Expressions at tim@fourstonesExpressions.com wrote: > 1/14/2003 9:05:37 AM, Fran?ois Granger wrote: > >> on 14/01/03 6:36, Neale Pickett at neale@woozle.org wrote: >> >>> Anthony Baxter writes: >>> >>>> Last time I looked, Eudora ate everything to the right of an @ sign >>>> as the server name. And the field was limited to something like 14 >>>> characters. >>> >>> Oh, man, that sucks! >>> >>> Okay then, I guess the only thing for it is to have a map in python from >>> usernames to username/host combinations. >>> >>> {'user1': ('pop3.bigmailhost.net', 'neale'), >>> 'user2': ('pop3.mediumhost.net', 'npickett')} >> >> I would have an issue with this since I try to have the same login name on >> various servers... >> francois.granger@free.fr >> francois.granger@laposte.net >> >> A scheme where i would use alogin of francois.granger:free and >> francois.granger:laposte would make it. Then pop3proxy just need to split on >> ":" and match the remaining "free" to "pop.free.fr". > > This would only work for mail servers that conform to the 'standard' naming > convention. I have one mail server that is 'incoming.verizon.net' We would > need to do :... this all gets ugly, and I'm > wondering how successfully the 'average user' could set all this up... - TimS If we have options in the ini file like: # format : mod:login@local:port remote_login@remote.server:remote_port account1 : francois.granger:laposte@127.0.0.1:110 francois.granger@pop.lapost.net:111 For each account, we have all needed parameters but password. -- Le courrier est un moyen de communication. Les gens devraient se poser des questions sur les implications politiques des choix (ou non choix) de leurs outils et technologies. Pour des courriers propres : -- From neale at woozle.org Tue Jan 14 08:42:10 2003 From: neale at woozle.org (Neale Pickett) Date: Tue Jan 14 11:42:21 2003 Subject: [Spambayes] Using Spambayes w/ Eudora In-Reply-To: =?iso-8859-1?q?(Fran=E7ois?= Granger's message of "Tue, 14 Jan 2003 16:43:46 +0100") References: Message-ID: Fran?ois Granger writes: > If we have options in the ini file like: > > # format : mod:login@local:port remote_login@remote.server:remote_port > account1 : francois.granger:laposte@127.0.0.1:110 > francois.granger@pop.lapost.net:111 > > For each account, we have all needed parameters but password. Right. That's what I meant :) Neale From neale at woozle.org Tue Jan 14 08:48:46 2003 From: neale at woozle.org (Neale Pickett) Date: Tue Jan 14 11:48:50 2003 Subject: [Spambayes] what else is needed for a first (source) release? In-Reply-To: <200301140741.h0E7f3R19057@localhost.localdomain> (Anthony Baxter's message of "Tue, 14 Jan 2003 18:41:03 +1100") References: <200301140741.h0E7f3R19057@localhost.localdomain> Message-ID: Anthony Baxter writes: > Ok, can people nominate things that they think would be good before a > first release? I'd like to try and get one out before the spam > conference (it's as good a date as any :) Word. Are you going to the spam conference too, Anthony? I've been using hammiefilter and mboxtrain for over a month now with no complaints, so I think that little corner of the code is ready for a release. We may want to merge HAMMIE.txt or some subset of it into a top-level README. Setting up procmail-based filtering, at this point, is a piece of cake. Neale From richie at entrian.com Tue Jan 14 16:50:08 2003 From: richie at entrian.com (Richie Hindle) Date: Tue Jan 14 11:50:33 2003 Subject: [Spambayes] re-org - making a package &c. In-Reply-To: <200301140300.h0E30aJ26394@localhost.localdomain> References: <200301140300.h0E30aJ26394@localhost.localdomain> Message-ID: [Anthony] > Well, I think I'm about done with it, so I'll be merging back into > the trunk shortly. Great! Well done on putting in the effort on this. [Richie] > Should we also have a 'resources' directory, or similar? [Anthony] > There's a few problems with that - the first, as you pointed out, is > finding the damn files. The second is getting distutils to do the > right thing with them. What we ended up doing with roundup was to > bundle all of the resources up into a separate python module, and > get it with 'import'. By coincidence (or is it?) Mike Fletcher has just announced his ResourcePackage package, which does exactly this. I'll look into using it - it could be just what we need. -- Richie Hindle richie@entrian.com From richie at entrian.com Tue Jan 14 17:01:07 2003 From: richie at entrian.com (Richie Hindle) Date: Tue Jan 14 12:01:28 2003 Subject: [Spambayes] what else is needed for a first (source) release? In-Reply-To: <15908.6730.504794.353666@gargle.gargle.HOWL> References: <200301140741.h0E7f3R19057@localhost.localdomain> <15908.6730.504794.353666@gargle.gargle.HOWL> Message-ID: Hi Barry, > Although it might not be ready until my train pulls into South > Station, I'm working on a Mailman handler module for integration with > Spambayes. The actual hook is pretty easy (using the hammie.py > interface) -- it's all the niddling little stuff like u/i, > moderation, training, configuration, etc. that's a bit rough around > the edges. It would be nice if you could leverage the existing web interface - I'm working towards making it less monolithic right now, which might help. -- Richie Hindle richie@entrian.com From skip at pobox.com Tue Jan 14 11:04:08 2003 From: skip at pobox.com (Skip Montanaro) Date: Tue Jan 14 12:05:56 2003 Subject: [Spambayes] what else is needed for a first (source) release? In-Reply-To: <200301140741.h0E7f3R19057@localhost.localdomain> References: <200301140741.h0E7f3R19057@localhost.localdomain> Message-ID: <15908.17160.397749.855624@montanaro.dyndns.org> Anthony> Ok, can people nominate things that they think would be good Anthony> before a first release? I have three changes available. The first two will get checked in when the off-campus link wakes up: * A -U option in hammiebulk.py which allows you to "untrain" a message. * Display the number of unsures as well as hams and spams (again, part of hammiebulk.py). * The ability for pop3proxy to fork off an external program before other setup (for tunnelling POP3 through ssh and such). This is still untested and probably Unix-only. I think I have some time today to try and test it. What's pop3graph.py? I tried executing "pop3graph.py --help" and got this gibberish: /Users/skip/local/bin/pop3graph.py: Analyse the pop3proxy's caches and produce a graph of how accurate classifier has been over time. Only really meaningful if you started with an empty database.: command not found from: can't read /var/mail/__future__. /Users/skip/local/bin/pop3graph.py: import: command not found from: can't read /var/mail/spambayes. from: can't read /var/mail/spambayes.FileCorpus. from: can't read /var/mail/spambayes.Options. /Users/skip/local/bin/pop3graph.py: line 12: syntax error near unexpected token `main()' /Users/skip/local/bin/pop3graph.py: line 12: `def main():' It would appear it gets installed in your executable bin directory but doesn't have a #! line. I'll add one, but I wonder if this is indicative of deeper problems. Should it be installed? Skip From barry at python.org Tue Jan 14 12:04:19 2003 From: barry at python.org (Barry A. Warsaw) Date: Tue Jan 14 12:06:10 2003 Subject: [Spambayes] what else is needed for a first (source) release? References: <200301140741.h0E7f3R19057@localhost.localdomain> Message-ID: <15908.17171.346379.639313@gargle.gargle.HOWL> >>>>> "NP" == Neale Pickett writes: NP> Word. Are you going to the spam conference too, Anthony? Hey, who else is going? I'll be there and even giving a talk . -Barry From richie at entrian.com Tue Jan 14 17:05:41 2003 From: richie at entrian.com (Richie Hindle) Date: Tue Jan 14 12:07:22 2003 Subject: [Spambayes] what else is needed for a first (source) release? In-Reply-To: <200301140741.h0E7f3R19057@localhost.localdomain> References: <200301140741.h0E7f3R19057@localhost.localdomain> Message-ID: <92f82v07amarkc7hotqfeksqh15ecdh6k3@4ax.com> [Anthony] > Ok, can people nominate things that they think would be good before a > first release? I'd like to try and get one out before the spam > conference (it's as good a date as any :) When is that date? The Linux Journal articles on Spambayes are due out at the beginning of February, so we should aim for whichever date is sooner. I have two lists of things that I think need doing: 1. Things that my Linux Journal article implies will be ready by the time the article is published: o Integration with Mutt (and other clients) via a single-shot script (like a single-message version of Hammie, or new switches to Hammie itself) See http://www.linuxjournal.com/article.php?sid=6439 Does anyone have something like this already, or any requirements? And I don't like to ask, but would anyone like to write this? 8-) I was intending to do it myself (and still will if no-one else fancies the job) but I'm going to be rushed off my feet these next two weeks... o Web-based configuration, and new doco on setting up the POP3 proxy using it. This is nearly there - I'm combining the pop3proxy.py web UI and Tim Stone's OptionConfig.py into one unified web interface. o Security for the web interface - done but not yet checked in. All this can do at the moment is limit web connections to localhost, but at least it means you're not opening up your Spambayes system to all and sundry. 2. Things that have cropped up recently and need sorting out: o Silly memory usage by the POP3 proxy. Done and awaiting checkin. o The Eudora problem. This is nasty - I think we'll end up with a compromise here (as proposed by Jeremy) because I don't see a clean solution out there. (See http://mail.python.org/pipermail/spambayes/2002-November/002054.html for an explanation of why combining the hostname and the username is not a perfect solution). o Integration of papaDoc's documentation into the website. -- Richie Hindle richie@entrian.com From richie at entrian.com Tue Jan 14 17:18:36 2003 From: richie at entrian.com (Richie Hindle) Date: Tue Jan 14 12:19:01 2003 Subject: [Spambayes] what else is needed for a first (source) release? In-Reply-To: <15908.17160.397749.855624@montanaro.dyndns.org> References: <200301140741.h0E7f3R19057@localhost.localdomain> <15908.17160.397749.855624@montanaro.dyndns.org> Message-ID: <4bh82vcr59pjebkg5pe0d5atavldufpmc8@4ax.com> [Skip] > I have three changes available. The first two will get checked in when the > off-campus link wakes up: > > * A -U option in hammiebulk.py which allows you to "untrain" a message. Ooo! That sounds like a step towards "Integration with Mutt (and other clients) via a single-shot script (like a single-message version of Hammie, or new switches to Hammie itself)" as appears on my to-do list! > What's pop3graph.py? [...] Should it be installed? It's a silly toy that shouldn't be in the main scripts area (it creates an ASCII graph of how accurate the POP3 proxy is over time). With a #! line it should work, but it's not important enough to live with the main scripts - it should go in 'utilities'. I'll move it when I next check in. -- Richie Hindle richie@entrian.com From neale at woozle.org Tue Jan 14 09:34:49 2003 From: neale at woozle.org (Neale Pickett) Date: Tue Jan 14 12:35:06 2003 Subject: [Spambayes] what else is needed for a first (source) release? In-Reply-To: <4bh82vcr59pjebkg5pe0d5atavldufpmc8@4ax.com> (Richie Hindle's message of "Tue, 14 Jan 2003 17:18:36 +0000") References: <200301140741.h0E7f3R19057@localhost.localdomain> <15908.17160.397749.855624@montanaro.dyndns.org> <4bh82vcr59pjebkg5pe0d5atavldufpmc8@4ax.com> Message-ID: Richie Hindle writes: > [Skip] >> I have three changes available. The first two will get checked in when the >> off-campus link wakes up: >> >> * A -U option in hammiebulk.py which allows you to "untrain" a message. > > Ooo! That sounds like a step towards "Integration with Mutt (and other > clients) via a single-shot script (like a single-message version of Hammie, > or new switches to Hammie itself)" as appears on my to-do list! You should consider adding this to hammiefilter, along with the ability to train while scoring. My idea was to make hammiefilter the single-message equivalent of hammiebulk. Speaking of which, I think hammiebulk and mboxtrain can be merged if hammiebulk gets a new "save state" flag. IIRC I wrote mboxtrain with the intention of it and hammiefilter replacing hammiebulk after a while. Is anyone using the -u option to hammiebulk anymore? Without -u (and -f, which is duplicated by hammiefilter) hammiebulk would be the same thing as mboxtrain. Neale From neale at woozle.org Tue Jan 14 09:37:59 2003 From: neale at woozle.org (Neale Pickett) Date: Tue Jan 14 12:42:56 2003 Subject: [Spambayes] what else is needed for a first (source) release? In-Reply-To: <92f82v07amarkc7hotqfeksqh15ecdh6k3@4ax.com> (Richie Hindle's message of "Tue, 14 Jan 2003 17:05:41 +0000") References: <200301140741.h0E7f3R19057@localhost.localdomain> <92f82v07amarkc7hotqfeksqh15ecdh6k3@4ax.com> Message-ID: Richie Hindle writes: > o Integration with Mutt (and other clients) via a single-shot script > (like a single-message version of Hammie, or new switches to Hammie > itself) See http://www.linuxjournal.com/article.php?sid=6439 Does > anyone have something like this already, or any requirements? And I > don't like to ask, but would anyone like to write this? 8-) I was > intending to do it myself (and still will if no-one else fancies the > job) but I'm going to be rushed off my feet these next two weeks... Well, my wife is currently using mutt with hammiefilter in her procmailrc and mboxtrain being run from a cron job. Is this an acceptable "with mutt" configuration, or were you thinking of something that mutt could run itself, like the outlook plugin? Either way, I'm volunteering to help with or do this one :) Neale From papaDoc at videotron.ca Tue Jan 14 12:35:51 2003 From: papaDoc at videotron.ca (papaDoc) Date: Tue Jan 14 12:45:36 2003 Subject: [Spambayes] what else is needed for a first (source) release? In-Reply-To: References: <200301140741.h0E7f3R19057@localhost.localdomain> <92f82v07amarkc7hotqfeksqh15ecdh6k3@4ax.com> <3E24458A.2040408@videotron.ca> Message-ID: <3E244A77.4050701@videotron.ca> Hi, I was talking with Richie about the documentation for pop3proxy. Since there will be many change to pop3proxy. I will wait before resubmitting my updated documentation. Since this will enable me to integrate the documentation for the newUI. If the new UI is submitted by the end of the week I will be able to update the documentation for the next tuesday night. >Hi Remi, > > > >>> o Web-based configuration, and new doco on setting up the POP3 proxy >>> using it. This is nearly there - I'm combining the pop3proxy.py web >>> UI and Tim Stone's OptionConfig.py into one unified web interface. >>> >>> >>> >>Should I wait until this is done to resubmit my doc ? >> >> > >If you're willing to update it to cover the new web interface, that would >be fantastic! If you are, you should mention it on the mailing list before >someone integrates your previous version into the web site. I'm hoping to >check in the new web interface before the end of the week. > > > From skip at pobox.com Tue Jan 14 13:08:21 2003 From: skip at pobox.com (Skip Montanaro) Date: Tue Jan 14 14:08:26 2003 Subject: [Spambayes] loosen up address_headers option? Message-ID: <15908.24613.426948.717998@montanaro.dyndns.org> The tokenizer's address_headers option only examines "from". The code has this comment: # Dang -- I can't use Sender:. If I do, # 'sender:email name:python-list-admin' # becomes the most powerful indicator in the whole database. # # From: # this helps both rates # Reply-To: # my error rates are too low now to tell about this # # one (smalls wins & losses across runs, overall # # not significant), so leaving it out # To:, Cc: # These can help, if your ham and spam are sourced # # from the same location. If not, they'll be horrible. which dates from a time early in the spambayes development history. (Can't tell exactly when since the recent directory reorganization. Could the loss of cvs comments have been avoided?) Much water has passed under the tokenizing bridge since then. I'm skeptical that the above token all by itself would relegate any spam to the hambox. In my personal experience, adding to and cc headers to the list would pick up some strong spam clues. While there are any number of @mojam.com email aliases which eventually reach me, most are essentially unused, having been harvested from obscure places in the Mojam websites and are rarely used by real people with Mojam business to transact. As spambayes moves out of the experimental stage, perhaps it's worth looking at adding to and cc (and maybe reply-to and sender) to the default list of analyzed headers. Skip From jm at jmason.org Tue Jan 14 19:29:25 2003 From: jm at jmason.org (Justin Mason) Date: Tue Jan 14 14:29:05 2003 Subject: [Spambayes] loosen up address_headers option? In-Reply-To: Message from Skip Montanaro <15908.24613.426948.717998@montanaro.dyndns.org> Message-ID: <20030114192930.2417116F17@jmason.org> Skip Montanaro said: > As spambayes moves out of the experimental stage, perhaps it's worth looking > at adding to and cc (and maybe reply-to and sender) to the default list of > analyzed headers. FWIW, I certainly found they had useful clues in SpamAssassin testing, Reply-To, To, and Cc at least. Sender, however, was just noise. A look at the SpamAssassin-devel archives may dig up the test results in question... --j. From richie at entrian.com Tue Jan 14 19:29:04 2003 From: richie at entrian.com (Richie Hindle) Date: Tue Jan 14 14:29:26 2003 Subject: [Spambayes] what else is needed for a first (source) release? In-Reply-To: References: <200301140741.h0E7f3R19057@localhost.localdomain> <92f82v07amarkc7hotqfeksqh15ecdh6k3@4ax.com> Message-ID: [Richie] > ...Integration with Mutt... [Neale] > Well, my wife is currently using mutt with hammiefilter in her procmailrc > and mboxtrain being run from a cron job. Is this an acceptable "with > mutt" configuration, or were you thinking of something that mutt could > run itself, like the outlook plugin? Your setup is great, but there is also the plugin route. Many people might prefer it because it's more user-driven and doesn't depend on cron jobs and so on. This way also means you don't need to keep spam around in your mailbox. Nick Moffitt's article at http://www.linuxjournal.com/article.php?sid=6439 shows how to integrate Bogofilter with Mutt such that Save automatically trains as ham, commands like Reply and so on do the same (on the grounds that you never save or reply to spam), and there's a new Delete As Spam command. Nick's system does auto-training as well, so it's a little different from what we'd expect with the current Spambayes, but the idea is the same. It's something that Don Marti, the Limux Journal editor, was keen on (I get the impression it's because he's a Mutt user and wants to use Spambayes as a plugin 8-) > Either way, I'm volunteering to help with or do this one :) Wonderful - many thanks! Hopefully Nick's article explains the idea in full. -- Richie Hindle richie@entrian.com From tim.one at comcast.net Tue Jan 14 14:34:32 2003 From: tim.one at comcast.net (Tim Peters) Date: Tue Jan 14 14:35:40 2003 Subject: [Spambayes] loosen up address_headers option? In-Reply-To: <15908.24613.426948.717998@montanaro.dyndns.org> Message-ID: [Skip Montanaro] > The tokenizer's address_headers option only examines "from". The code has > this comment: > > ... > > As spambayes moves out of the experimental stage, perhaps it's > worth looking at adding to and cc (and maybe reply-to and sender) to the > default list of analyzed headers. They remain killer-strong clues for bad reasons when training on mixed-source corpora, so caution is still in order. In the Outlook client, life is so constrained (meaning mixed-source corpora are darned hard to get at there) that the Outlook client's default has been: [Tokenizer] address_headers: from to cc sender reply-to for a long time. This works fine in practice, except when python.org has to turn off Spamassassin and lots of spam leaks thru. Then it piles up lots of "this came from python.org, so it's probably not spam" tokens, which increases the incidence of FN and (especially) spam rating Unsure. From barry at python.org Tue Jan 14 14:31:30 2003 From: barry at python.org (Barry A. Warsaw) Date: Tue Jan 14 14:37:28 2003 Subject: [Spambayes] what else is needed for a first (source) release? References: <200301140741.h0E7f3R19057@localhost.localdomain> <15908.6730.504794.353666@gargle.gargle.HOWL> Message-ID: <15908.26002.8870.504247@gargle.gargle.HOWL> >>>>> "RH" == Richie Hindle writes: RH> It would be nice if you could leverage the existing web RH> interface - I'm working towards making it less monolithic RH> right now, which might help. For configuring, yes, we'll use the Privacy -> Spam Filters page. The trick is the admindb page, which already sucks. But I don't intend to clean it up for the prototype. -Barry From richie at entrian.com Tue Jan 14 19:46:25 2003 From: richie at entrian.com (Richie Hindle) Date: Tue Jan 14 14:46:49 2003 Subject: [Spambayes] what else is needed for a first (source) release? In-Reply-To: <15908.26002.8870.504247@gargle.gargle.HOWL> References: <200301140741.h0E7f3R19057@localhost.localdomain> <15908.6730.504794.353666@gargle.gargle.HOWL> <15908.26002.8870.504247@gargle.gargle.HOWL> Message-ID: <15q82vkjd9t07ddgc2ms9gr889v1dqvi6e@4ax.com> [Richie] > It would be nice if you could leverage the existing web > interface - I'm working towards making it less monolithic > right now, which might help. [Barry] > For configuring, yes, we'll use the Privacy -> Spam Filters page. The > trick is the admindb page, which already sucks. But I don't intend to > clean it up for the prototype. Are we talking at cross purposes? I meant leverage the existing *Spambayes* web interface that's (currently) a part of pop3proxy.py. But it sounds like you're way ahead of me anyway. -- Richie Hindle richie@entrian.com From barry at python.org Tue Jan 14 14:49:59 2003 From: barry at python.org (Barry A. Warsaw) Date: Tue Jan 14 14:50:30 2003 Subject: [Spambayes] what else is needed for a first (source) release? References: <200301140741.h0E7f3R19057@localhost.localdomain> <15908.6730.504794.353666@gargle.gargle.HOWL> <15908.26002.8870.504247@gargle.gargle.HOWL> <15q82vkjd9t07ddgc2ms9gr889v1dqvi6e@4ax.com> Message-ID: <15908.27111.16922.418177@gargle.gargle.HOWL> >>>>> "RH" == Richie Hindle writes: RH> [Richie] >> It would be nice if you could leverage the existing web >> interface - I'm working towards making it less monolithic right >> now, which might help. RH> [Barry] >> For configuring, yes, we'll use the Privacy -> Spam Filters >> page. The trick is the admindb page, which already sucks. But >> I don't intend to clean it up for the prototype. RH> Are we talking at cross purposes? I meant leverage the RH> existing *Spambayes* web interface that's (currently) a part RH> of pop3proxy.py. But it sounds like you're way ahead of me RH> anyway. Oops, we're talking about different things. I'm talking about the hooks in Mailman to enable spambayes scoring, and what to do with messages based on those scores. Configuring spambayes itself is another kettle of fish, one that I'm not planning on addressing for my prototype. -Barry From whisper at oz.net Tue Jan 14 12:38:30 2003 From: whisper at oz.net (David LeBlanc) Date: Tue Jan 14 15:38:28 2003 Subject: [Spambayes] Lot of files removed from CVS? Message-ID: I just updated my local copy of CVS (from about a week ago or so) and got this (normal update messages removed): cvs server: Corpus.py is no longer in the repository cvs server: CostCounter.py is no longer in the repository cvs server: FileCorpus.py is no longer in the repository cvs server: HistToGNU.py is no longer in the repository cvs server: Histogram.py is no longer in the repository cvs server: Options.py is no longer in the repository cvs server: TestDriver.py is no longer in the repository cvs server: Tester.py is no longer in the repository cvs server: cdb.py is no longer in the repository cvs server: chi2.py is no longer in the repository cvs server: classifier.py is no longer in the repository cvs server: cmp.py is no longer in the repository cvs server: dbmstorage.py is no longer in the repository cvs server: fpfn.py is no longer in the repository cvs server: hammiebulk.py is no longer in the repository cvs server: heapq.py is no longer in the repository cvs server: loosecksum.py is no longer in the repository cvs server: mboxcount.py is no longer in the repository cvs server: mboxtest.py is no longer in the repository P mboxtrain.py cvs server: mboxutils.py is no longer in the repository cvs server: msgs.py is no longer in the repository cvs server: optimize.py is no longer in the repository cvs server: rates.py is no longer in the repository cvs server: rebal.py is no longer in the repository cvs server: sets.py is no longer in the repository cvs server: simplexloop.py is no longer in the repository cvs server: split.py is no longer in the repository cvs server: splitn.py is no longer in the repository cvs server: splitndirs.py is no longer in the repository cvs server: storage.py is no longer in the repository cvs server: table.py is no longer in the repository cvs server: timcv.py is no longer in the repository cvs server: timtest.py is no longer in the repository cvs server: tokenizer.py is no longer in the repository cvs server: weaktest.py is no longer in the repository *****CVS exited normally with code 0***** Is this correct? David LeBlanc Seattle, WA USA From skip at pobox.com Tue Jan 14 14:42:41 2003 From: skip at pobox.com (Skip Montanaro) Date: Tue Jan 14 15:42:46 2003 Subject: [Spambayes] Lot of files removed from CVS? In-Reply-To: References: Message-ID: <15908.30273.472786.864309@montanaro.dyndns.org> David> I just updated my local copy of CVS (from about a week ago or so) David> and got this (normal update messages removed): ... David> Is this correct? Yup. Try cvs -dP . instead. Anthony moved stuff all around a day or two ago. Skip From whisper at oz.net Tue Jan 14 12:52:47 2003 From: whisper at oz.net (David LeBlanc) Date: Tue Jan 14 15:52:30 2003 Subject: [Spambayes] Lot of files removed from CVS? In-Reply-To: <15908.30273.472786.864309@montanaro.dyndns.org> Message-ID: I'm using wincvs. By selecting "create missing directories that exist in the repository", I get the -d flag (and a bunch of "U" messages of some (all?) of the files listed in my last post during the refresh) - what's the P mean? I also tried using wincvs' command line option to run the command suggested and got back a usage message. Sorry, I realize this is OT a bit... David LeBlanc Seattle, WA USA > -----Original Message----- > From: Skip Montanaro [mailto:skip@pobox.com] > Sent: Tuesday, January 14, 2003 12:43 > To: David LeBlanc > Cc: spambayes@python.org > Subject: Re: [Spambayes] Lot of files removed from CVS? > > > > David> I just updated my local copy of CVS (from about a week > ago or so) > David> and got this (normal update messages removed): > ... > David> Is this correct? > > Yup. Try > > cvs -dP . > > instead. Anthony moved stuff all around a day or two ago. > > Skip From skip at pobox.com Tue Jan 14 15:21:46 2003 From: skip at pobox.com (Skip Montanaro) Date: Tue Jan 14 16:21:51 2003 Subject: [Spambayes] Lot of files removed from CVS? In-Reply-To: References: <15908.30273.472786.864309@montanaro.dyndns.org> Message-ID: <15908.32618.148694.815679@montanaro.dyndns.org> what's the P mean? "P"rune empty directories. Skip From carel.fellinger at chello.nl Tue Jan 14 22:21:35 2003 From: carel.fellinger at chello.nl (Carel Fellinger) Date: Tue Jan 14 16:33:53 2003 Subject: [Spambayes] re-org - making a package &c. In-Reply-To: References: <20030113111344.GA12027@mail.felnet> <20030113192419.GA17717@mail.felnet> Message-ID: <20030114212135.GA1769@mail.felnet> On Mon, Jan 13, 2003 at 11:42:56AM -0800, Neale Pickett wrote: ... > In any case, what you propose would work as a tuning tool, to be run > whenever you want to tune your config. I would look at the existing > test programs and try to figure out a way to combine them. I believe fine idea, but.. > You still interested in doing this, Carel? to be honest but blunt: no, not at all. Maybe in a few weeks i'm in a better position to spent some time on it, but my hopes are low:( Heck, I even haven't come round to install spambayes and enjoy its excelence! living-a-lurking-live-isn't-always-by-choice-ly y'rs - carel From skip at pobox.com Tue Jan 14 15:55:48 2003 From: skip at pobox.com (Skip Montanaro) Date: Tue Jan 14 16:55:51 2003 Subject: [Spambayes] pop3proxy - a couple issues Message-ID: <15908.34660.141569.623885@montanaro.dyndns.org> I am just trying out pop3proxy for the first time with ssh (having pop3proxy start an ssh session that forwards the pop connection) and ran into a couple problems. I don't think they are related to my use of an ssh tunnel, and I'm not about to test pop3proxy without ssh and have my password go over the net in the clear (though I will mess about with manually starting ssh external to pop3proxy shortly). Messages get sucked over the pipe and properly classified, however I have two problems: * No X-Hammie-Debug headers are added to the processed messages even though I have [Hammie] hammie_debug_header: True in my options file and have BAYESCUSTOMIZE set to BAYESCUSTOMIZE=$HOME/hammie.opt * When I click the "review" button in my web browser I get a max recursion depth exception from pop3proxy and a blank page in my browser. Here are the start and end of the asyncore traceback: error: uncaptured python exception, closing channel <__main__.UserInterface connected at 0x534940> (exceptions.RuntimeError:maximum recursion depth exceeded [/Users/skip/local/lib/python2.3/asyncore.py|read|69] [/Users/skip/local/lib/python2.3/asyncore.py|handle_read_event|385] [/Users/skip/local/lib/python2.3/asynchat.py|handle_read|136] [/Users/skip/local/bin/pop3proxy.py|found_terminator|808] [/Users/skip/local/bin/pop3proxy.py|onRequest|834] [/Users/skip/local/bin/pop3proxy.py|onReview|1146] [/Users/skip/local/lib/python2.3/site-packages/spambayes/Corpus.py|__getitem__|208] [/Users/skip/local/lib/python2.3/site-packages/spambayes/Corpus.py|__getattr__|282] [/Users/skip/local/lib/python2.3/site-packages/spambayes/Corpus.py|__getattr__|282] ... [/Users/skip/local/lib/python2.3/site-packages/spambayes/Corpus.py|__getattr__|282] [/Users/skip/local/lib/python2.3/site-packages/spambayes/Corpus.py|__getattr__|282] [/Users/skip/local/lib/python2.3/site-packages/spambayes/Corpus.py|__getattr__|282] The proxy is started like so: pop3proxy.py -p ~/hammie.db -d -l 11111 \ -e 'ssh -q -C -f mail.mojam.com -L 11110:localhost:110 bash -c \ "while true ; do sleep 60 ; done"' \ localhost 11110 (remove the backslashes before trying this at home). All the -e flag does is get the associated command started up before doing anything else: state.buildServerStrings() pid = 0 if state.initCommand: pid = spawnInitCommand(state.initCommand) try: main(state.servers, state.proxyPorts, state.uiPort, state.launchUI) finally: if pid: killInitCommand(pid) spawnInitCommand and killInitCommand are straightforward: # these may need some changing for non-Unixoid platforms def spawnInitCommand(cmd): """run cmd (a string) in the background""" cmd, args = cmd.split(" ", 1) args = args.split() return os.spawnvp(os.P_NOWAIT, cmd, args) def killInitCommand(pid): os.kill(pid, signal.SIGHUP) Is anyone else seeing these problems? I'm running on Mac OS X with a fairly recent CVS checkout of Python (Jan 7 2003, 16:09) and with spambayes updated earlier today. Thanks, Skip From francois.granger at free.fr Tue Jan 14 23:13:53 2003 From: francois.granger at free.fr (=?iso-8859-1?Q?Fran=E7ois?= Granger) Date: Tue Jan 14 17:13:59 2003 Subject: [Spambayes] what else is needed for a first (source) release? In-Reply-To: References: <200301140741.h0E7f3R19057@localhost.localdomain> Message-ID: At 08:48 -0800 on 14/01/2003, in message Re: [Spambayes] what else is needed for a first (source, Neale Pickett wrote: > >We may want to merge HAMMIE.txt or some subset of it into a top-level >README. Setting up procmail-based filtering, at this point, is a piece >of cake. Even I was able to do it... ;-) If you want to cut and past some naive words for the readme, there are some words of what I did at bottom of first box ... http://francois.granger.free.fr/radiohome/2002/12/29.html -- Recently using MacOSX....... From francois.granger at free.fr Tue Jan 14 23:30:22 2003 From: francois.granger at free.fr (=?iso-8859-1?Q?Fran=E7ois?= Granger) Date: Tue Jan 14 17:30:27 2003 Subject: [Spambayes] what else is needed for a first (source) release? In-Reply-To: <92f82v07amarkc7hotqfeksqh15ecdh6k3@4ax.com> References: <200301140741.h0E7f3R19057@localhost.localdomain> <92f82v07amarkc7hotqfeksqh15ecdh6k3@4ax.com> Message-ID: At 17:05 +0000 14/01/2003, in message Re: [Spambayes] what else is needed for a first (source, Richie Hindle wrote: > o The Eudora problem. This is nasty - I think we'll end up with a > compromise here (as proposed by Jeremy) because I don't see a clean > solution out there. (See > http://mail.python.org/pipermail/spambayes/2002-November/002054.html > for an explanation of why combining the hostname and the username is > not a perfect solution). The solution given by Tony Lownds of two loopback adresses does not works on MacOS 9 either. I shortly tested it today at work. I will be testing it at home on MacOS X soon. I would say that giving a "complex login" like proposed earlier could be the only solution for Eudora MacOS 9. (Not that I care now that I switched to X ;-) server1 = francois.granger:free@127.0.0.1, pop.free.fr:110 server2 = francois.granger:lap, pop.laposte.net In Eudora, you put a login of francois.granger:free and a server at 127.0.0.1, pop3proxy remove the :free and use the pop.free.fr as mail server. It is not much more that what is already needed, and it don't have the problem above. Easy to say. I don't know how much changes in the code is needed. -- Recently using MacOSX....... From mhammond at skippinet.com.au Wed Jan 15 09:41:20 2003 From: mhammond at skippinet.com.au (Mark Hammond) Date: Tue Jan 14 17:42:07 2003 Subject: [Spambayes] what else is needed for a first (source) release? In-Reply-To: <200301140741.h0E7f3R19057@localhost.localdomain> Message-ID: <06d401c2bc1e$0f782cd0$530f8490@eden> > I figure we make the first release a source only one - which isn't a > biggie for most people, since it's in python. Should the release just > have what's in the current setup.py (which doesn't include > the Outlook2000 > directory), with a separate release for the O2K plugin+core? Or should > it all be in one big bundle? If we are going source-code only, then I would say just one bundle is fine. We just point out the win32all versions people need, then they run "outlook2000\addin.py", and everything works. We need some better docs, which would include our intention to move to a binary/bz2 distribution, and some good documentation on how to get started with training etc. Mark. From tony-bayes at lownds.com Tue Jan 14 14:47:33 2003 From: tony-bayes at lownds.com (Tony Lownds) Date: Tue Jan 14 17:47:28 2003 Subject: [Spambayes] pop3proxy - a couple issues Message-ID: Skip wrote: > * When I click the "review" button in my web browser I get a max > recursion depth exception from pop3proxy and a blank page in my > browser. Here are the start and end of the asyncore traceback: I ran into this too; the stack size is too small. Run one of these commands first: tcsh: ulimit stacksize 2048 sh: ulimit -s 2048 Mac OS X's default is 512, I picked 2048 at random. >The proxy is started like so: > > pop3proxy.py -p ~/hammie.db -d -l 11111 \ > -e 'ssh -q -C -f mail.mojam.com -L 11110:localhost:110 bash -c \ > "while true ; do sleep 60 ; done"' \ > localhost 11110 ssh has an -N flag that will replace that while loop. >(remove the backslashes before trying this at home). All the -e flag does >is get the associated command started up before doing anything else: I have found, in my one day of using ssh tunnels + pop3proxy ,that my ssh tunnels will go down (due to the computer going to sleep or my internet connection being flakey) more often than pop3proxy.py does (due to me closing it). So, perhaps the command is better spawned when the proxy can't connect to the server. Just a thought... -Tony From skip at pobox.com Tue Jan 14 17:08:28 2003 From: skip at pobox.com (Skip Montanaro) Date: Tue Jan 14 18:08:35 2003 Subject: [Spambayes] pop3proxy - a couple issues In-Reply-To: References: Message-ID: <15908.39020.117137.398334@montanaro.dyndns.org> Tony> Skip wrote: >> * When I click the "review" button in my web browser I get a max >> recursion depth exception from pop3proxy and a blank page in my >> browser. Here are the start and end of the asyncore traceback: Tony> I ran into this too; the stack size is too small. Run one of these Tony> commands first: Tony> tcsh: ulimit stacksize 2048 Tony> sh: ulimit -s 2048 Tony> Mac OS X's default is 512, I picked 2048 at random. That's not it, or at least a stacksize of 2048 won't be sufficient. I already have my stack size set to 8192 by default: % ulimit -a core file size (blocks, -c) 0 data seg size (kbytes, -d) 6144 file size (blocks, -f) unlimited max locked memory (kbytes, -l) unlimited max memory size (kbytes, -m) unlimited open files (-n) 256 pipe size (512 bytes, -p) 1 stack size (kbytes, -s) 8192 cpu time (seconds, -t) unlimited max user processes (-u) 100 virtual memory (kbytes, -v) 14336 Looking at the traceback it seems to me that a __getattr__ or __getitem__ method has a bug. >> The proxy is started like so: >> >> pop3proxy.py -p ~/hammie.db -d -l 11111 \ >> -e 'ssh -q -C -f mail.mojam.com -L 11110:localhost:110 bash -c \ >> "while true ; do sleep 60 ; done"' \ >> localhost 11110 Tony> ssh has an -N flag that will replace that while loop. I tried it. It doesn't work as I'd like. When you use -N, ssh exits after one proxy session. That is, pop3proxy connects through the tunnel as a result of a local connection request, then once that session is complete, ssh exits. The next time the local mail user agent (fetchmail in my case at the moment), pop3proxy gets a connection refused message because ssh is gone. >> (remove the backslashes before trying this at home). All the -e flag >> does is get the associated command started up before doing anything >> else: Tony> I have found, in my one day of using ssh tunnels + pop3proxy ,that Tony> my ssh tunnels will go down (due to the computer going to sleep or Tony> my internet connection being flakey) more often than pop3proxy.py Tony> does (due to me closing it). So, perhaps the command is better Tony> spawned when the proxy can't connect to the server. Just a Tony> thought... Maybe, but that's going to be a fair amount more work. Skip From anthony at interlink.com.au Wed Jan 15 10:51:49 2003 From: anthony at interlink.com.au (Anthony Baxter) Date: Tue Jan 14 18:53:08 2003 Subject: [Spambayes] what else is needed for a first (source) release? In-Reply-To: Message-ID: <200301142351.h0ENpnk29524@localhost.localdomain> >>> Neale Pickett wrote > Word. Are you going to the spam conference too, Anthony? I wish... nah, air fares cost too much, plus business situation is such that work can't/won't pay for it (I don't even get to go to pycon :( > I've been using hammiefilter and mboxtrain for over a month now with no > complaints, so I think that little corner of the code is ready for a > release. Hm. I've been doing my training via hammie. I think we might want to remove one or two of the myriad ways to train the system before release. Anthony -- Anthony Baxter It's never too late to have a happy childhood. From anthony at interlink.com.au Wed Jan 15 11:22:25 2003 From: anthony at interlink.com.au (Anthony Baxter) Date: Tue Jan 14 19:23:36 2003 Subject: [Spambayes] what else is needed for a first (source) release? In-Reply-To: <15908.6730.504794.353666@gargle.gargle.HOWL> Message-ID: <200301150022.h0F0MPx29798@localhost.localdomain> >>> Barry A. Warsaw wrote > > >>>>> "AB" == Anthony Baxter writes: > > AB> Ok, can people nominate things that they think would be good > AB> before a first release? I'd like to try and get one out before > AB> the spam conference (it's as good a date as any :) > > Although it might not be ready until my train pulls into South > Station, Ah, you wacky americans and your strange mannerisms. What does this mean in english? > I'm working on a Mailman handler module for integration with > Spambayes. The actual hook is pretty easy (using the hammie.py > interface) -- it's all the niddling little stuff like u/i, > moderation, training, configuration, etc. that's a bit rough around > the edges. This will be one database per list? > Having a spambayes package I can unpack in Mailman's pythonlib dir is > perfect. Can you make sure, then, that the API that is exposed is sufficient for your needs? -- Anthony Baxter It's never too late to have a happy childhood. From richie at entrian.com Wed Jan 15 00:38:54 2003 From: richie at entrian.com (Richie Hindle) Date: Tue Jan 14 19:39:34 2003 Subject: [Spambayes] what else is needed for a first (source) release? In-Reply-To: <06d401c2bc1e$0f782cd0$530f8490@eden> References: <200301140741.h0E7f3R19057@localhost.localdomain> <06d401c2bc1e$0f782cd0$530f8490@eden> Message-ID: [Mark] > We need some better docs, which would include our intention to move to a > binary/bz2 distribution, and some good documentation on how to get started > with training etc. I can help there - as long as we give a copyright attribution, we can reuse parts of my Linux Journal article in the documentation. Publishing the whole thing in advance of the magazine coming out would be kind of impolite, but we can use pieces of it. That said, I have no time to work on the documentation directly - anyone who *is* working on it, please feel free to ask for a copy of my article. -- Richie Hindle richie@entrian.com From richie at entrian.com Wed Jan 15 00:40:02 2003 From: richie at entrian.com (Richie Hindle) Date: Tue Jan 14 19:40:43 2003 Subject: [Spambayes] pop3proxy - a couple issues In-Reply-To: <15908.39020.117137.398334@montanaro.dyndns.org> References: <15908.39020.117137.398334@montanaro.dyndns.org> Message-ID: <1db92vkbiopoejvvebf6ii16rfqourqp3d@4ax.com> [Skip] > it seems to me that a __getattr__ or __getitem__ method has a bug. I'll look at this - thanks. But I'll go to bed first. -- Richie Hindle richie@entrian.com From tim at fourstonesExpressions.com Tue Jan 14 18:30:28 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Tue Jan 14 19:41:24 2003 Subject: [Spambayes] what else is needed for a first (source) release? In-Reply-To: <200301150022.h0F0MPx29798@localhost.localdomain> Message-ID: Off topic... - TimS 1/14/2003 6:22:25 PM, Anthony Baxter wrote: > >>>> Barry A. Warsaw wrote >> >> >>>>> "AB" == Anthony Baxter writes: >> >> AB> Ok, can people nominate things that they think would be good >> AB> before a first release? I'd like to try and get one out before >> AB> the spam conference (it's as good a date as any :) >> >> Although it might not be ready until my train pulls into South >> Station, > >Ah, you wacky americans and your strange mannerisms. What does this >mean in english? > There's no such thing as South Station... hehe >> I'm working on a Mailman handler module for integration with >> Spambayes. The actual hook is pretty easy (using the hammie.py >> interface) -- it's all the niddling little stuff like u/i, >> moderation, training, configuration, etc. that's a bit rough around >> the edges. Niddling? Invert above mannerism jab... > >This will be one database per list? > >> Having a spambayes package I can unpack in Mailman's pythonlib dir is >> perfect. > >Can you make sure, then, that the API that is exposed is sufficient >for your needs? > >-- >Anthony Baxter >It's never too late to have a happy childhood. > > >_______________________________________________ >Spambayes mailing list >Spambayes@python.org >http://mail.python.org/mailman/listinfo/spambayes > > c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org From richard at jowsey.com Wed Jan 15 12:20:43 2003 From: richard at jowsey.com (Richard Jowsey) Date: Tue Jan 14 20:21:16 2003 Subject: [Spambayes] FYI: Java implementation Message-ID: <3E25521B.20937.3607FFA@localhost> Hi all, I've been building a Java implementation of Paul Graham's "Bayesian" classification logic over the past couple months, intended as a plug-in filter for the Apache JAMES mail server. However, after considerable testing, tweaking and tuning via a proxy setup (similar to POPFile), plus some recent lurking on the Spambayes list, I'm now modifying this project to incorporate the excellent notions contributed by Gary Robinson, et al, as implemented in your Python code. Early results are *very* promising!!! This death2spam stuff is definitely heading in the right direction! I haven't quite finished the chi2 comparison logic, but even using just "gary- combining", the kinds of messages ending up in my "uncertain" category make much more sense. Plus I'm now seeing far less weirdness caused by Graham's "2 * nGood + nSpam >= 5" trick, etc. Will keep the list posted as to further progress. I'd sure love to attend the upcoming spam-fest at MIT, but we moved downunder (Seattle -> Sydney) last year, and it's one helluva long way to go just for a day... Many thanks for all your fine coding, testing efforts, and thoughtful conversations! It's been very helpful, not to mention highly entertaining at times. ;-) Cheers, Richard From barry at python.org Tue Jan 14 20:41:35 2003 From: barry at python.org (Barry A. Warsaw) Date: Tue Jan 14 20:42:04 2003 Subject: [Spambayes] what else is needed for a first (source) release? References: <15908.6730.504794.353666@gargle.gargle.HOWL> <200301150022.h0F0MPx29798@localhost.localdomain> Message-ID: <15908.48207.917525.910821@gargle.gargle.HOWL> >>>>> "AB" == Anthony Baxter writes: >> :) Although it might not be ready until my train pulls into >> South Station, AB> Ah, you wacky americans and your strange mannerisms. What does AB> this mean in english? It means my floob boober babs boober bubs won't constrapulate until my sneenkle quods the flamb. Jeez, you Aussies. (Translation: i'm on a 6:30 hour train trip, with not much else to do than randomly peck at my laptop.) >> I'm working on a Mailman handler module for integration with >> Spambayes. The actual hook is pretty easy (using the hammie.py >> interface) -- it's all the niddling little stuff like >> u/i, moderation, training, configuration, etc. that's a bit >> rough around the edges. AB> This will be one database per list? Yup. >> Having a spambayes package I can unpack in Mailman's pythonlib >> dir is perfect. AB> Can you make sure, then, that the API that is exposed is AB> sufficient for your needs? I'm make sure I cvs up before I leave. Won't have much time to look before then, but I think the hammie.py module is all I need (from the Mailman side -- for now). If that's in spambayes.hammie I'm all set. -Barry From barry at python.org Tue Jan 14 20:44:03 2003 From: barry at python.org (Barry A. Warsaw) Date: Tue Jan 14 20:44:32 2003 Subject: [Spambayes] pop3proxy - a couple issues References: Message-ID: <15908.48355.950264.416317@gargle.gargle.HOWL> >>>>> "TL" == Tony Lownds writes: TL> I ran into this too; the stack size is too small. Run one of TL> these commands first: TL> tcsh: ulimit stacksize 2048 TL> sh: ulimit -s 2048 TL> Mac OS X's default is 512, I picked 2048 at random. That crops up a lot with Python, i.e. test_re IIRC, and definitely in Mailman. -Barry From T.A.Meyer at massey.ac.nz Wed Jan 15 15:43:15 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Tue Jan 14 21:56:07 2003 Subject: [Spambayes] Outlook plugin & bad folders Message-ID: <98B01D2717B9D411B38F0008C78409310EE3DAE0@its-xchg2.massey.ac.nz> Hi, I've had a trouble with the Outlook plugin in that whenever it tries to build a folder list (i.e. in the various dialogs) an exception is raised and the list presented is empty. I traced it to a bad folder (Outlook can't display it either). Now, normally, one should fix the cause, not the effect, but in this case the folder is on an exchange server and is not mine (it's a public folder). Getting the owner of the folder to fix things would be very difficult. So I altered FolderSelector.py so that if a bad folder causes this sort of problem, it's simply not presented in the list (but all the other folders are). Probably not all that important, but it does (in most ways) make it more user-friendly. Anyway, here's the new function in case you want to alter the cvs to reflect it. I've never used Python before, so this may not be the best way to do this (suggestions of better ways are welcome, obviously). import pywintypes def _BuildFolderTreeOutlook(session, parent): children = [] for i in range(parent.Folders.Count): folder = parent.Folders[i+1] try: spec = FolderSpec((folder.StoreID, folder.EntryID), folder.Name.encode("mbcs", "replace")) if folder.Folders: spec.children = _BuildFolderTreeOutlook(session, folder) children.append(spec) except pywintypes.com_error: print "Skipping folder " + folder.Name return children =Tony Meyer From mhammond at skippinet.com.au Wed Jan 15 14:22:39 2003 From: mhammond at skippinet.com.au (Mark Hammond) Date: Tue Jan 14 22:23:18 2003 Subject: [Spambayes] Outlook plugin & bad folders In-Reply-To: <98B01D2717B9D411B38F0008C78409310EE3DAE0@its-xchg2.massey.ac.nz> Message-ID: <078601c2bc45$5c874750$530f8490@eden> [Tony Meyer] > Now, normally, one should fix the cause, not the effect, but > in this case the folder is on an exchange server and is not > mine (it's a public folder). Getting the owner of the folder > to fix things would be very difficult. Of course, fixing the cause makes sense when possible, but if Outlook and other tools all work OK, then it is a bug in spambayes that we don't. > I've never > used Python before, so this may not be the best way to do > this (suggestions of better ways are welcome, obviously). Excellent! The more common pattern is to catch pythoncom.error, but pywintypes.com_error is an alias for the same object, so your code is just fine. The only thing is that we are wrapping the recursive call to _BuildFolderTree in the exception handler. I would generally prefer to only catch the operation in error. Is it possible for you to include the full traceback without this patch applied? Then I will get it into CVS. Thanks for digging in to find this problem! Mark. From mhammond at skippinet.com.au Wed Jan 15 14:29:05 2003 From: mhammond at skippinet.com.au (Mark Hammond) Date: Tue Jan 14 22:29:52 2003 Subject: [Spambayes] Updating outlook to the new directory structure Message-ID: <078d01c2bc46$420ff5b0$530f8490@eden> FYI, after the recent source reorg, the Outlook addin seems to work fine, except for 2 things: * You must remember to blow away your .pyc files, else things may go screwey, and you won't notice the next point until later. * You need to do a full retrain of the database (as the module name stored in the pickle has changed) Apart from that, it all looks good. If we can just get rid of more .py files from the root, life will be good Mark. From anthony at interlink.com.au Wed Jan 15 14:39:36 2003 From: anthony at interlink.com.au (Anthony Baxter) Date: Tue Jan 14 22:40:56 2003 Subject: [Spambayes] Updating outlook to the new directory structure In-Reply-To: <078d01c2bc46$420ff5b0$530f8490@eden> Message-ID: <200301150339.h0F3dar01317@localhost.localdomain> >>> "Mark Hammond" wrote > * You must remember to blow away your .pyc files, else things may go > screwey, and you won't notice the next point until later. > * You need to do a full retrain of the database (as the module name stored > in the pickle has changed) Oo. Yuk. Good catch. > Apart from that, it all looks good. If we can just get rid of more .py > files from the root, life will be good We could port it all to perl, or ruby, or something? :) -- Anthony Baxter It's never too late to have a happy childhood. From anthony at interlink.com.au Wed Jan 15 14:44:45 2003 From: anthony at interlink.com.au (Anthony Baxter) Date: Tue Jan 14 22:46:10 2003 Subject: [Spambayes] new attempt at non-technical explanation on index.html of website Message-ID: <200301150344.h0F3ijF01416@localhost.localdomain> I just checked in some text that attempts to explain, in a mostly non-technical way, how spambayes works. It's the "handwaving" bit on the index.html document on the website. suggestions for improvement accepted. From T.A.Meyer at massey.ac.nz Wed Jan 15 16:36:59 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Tue Jan 14 22:50:08 2003 Subject: [Spambayes] Outlook plugin & bad folders Message-ID: <98B01D2717B9D411B38F0008C78409310EE3DAE1@its-xchg2.massey.ac.nz> [Mark Hammond] > The only thing is that we are wrapping the recursive call to > _BuildFolderTree in the exception handler. I would generally > prefer to only > catch the operation in error. Is it possible for you to > include the full > traceback without this patch applied? Then I will get it into CVS. Here you go: Traceback (most recent call last): File "D:\CVS Modules\spambayes\Outlook2000\dialogs\FolderSelector.py", line 322, in OnInitDialog tree = BuildFolderTreeOutlook(self.mapi) File "D:\CVS Modules\spambayes\Outlook2000\dialogs\FolderSelector.py", line 128, in BuildFolderTreeOutlook root.children = _BuildFolderTreeOutlook(session, session) File "D:\CVS Modules\spambayes\Outlook2000\dialogs\FolderSelector.py", line 122, in _BuildFolderTreeOutlook spec.children = _BuildFolderTreeOutlook(session, folder) File "D:\CVS Modules\spambayes\Outlook2000\dialogs\FolderSelector.py", line 122, in _BuildFolderTreeOutlook spec.children = _BuildFolderTreeOutlook(session, folder) File "D:\CVS Modules\spambayes\Outlook2000\dialogs\FolderSelector.py", line 122, in _BuildFolderTreeOutlook spec.children = _BuildFolderTreeOutlook(session, folder) File "D:\CVS Modules\spambayes\Outlook2000\dialogs\FolderSelector.py", line 119, in _BuildFolderTreeOutlook spec = FolderSpec((folder.StoreID, folder.EntryID), File "D:\Python22\lib\site-packages\win32com\client\__init__.py", line 369, in __getattr__ return apply(self._ApplyTypes_, args) File "D:\Python22\lib\site-packages\win32com\client\__init__.py", line 363, in _ApplyTypes_ return self._get_good_object_(apply(self._oleobj_.InvokeTypes, (dispid, 0, wFlags, retType, argTypes) + args), user, resultCLSID) pywintypes.com_error: (-2147352567, 'Exception occurred.', (4096, 'Microsoft Outlook', 'The operation failed.', None, 0, -2147221233), None) win32ui: OnInitDialog() virtual handler (>) raised an exception > Thanks for digging in to find this problem! It wouldn't have felt right to mail a "my folder list is empty" message to the list and not do something myself :) Along a similiar(ish) line: I actually have another line added to _BuildFolderTreeOutlook that skips me past all the public folders (just a 'if folder.name == "Public Folders" kind of thing), because otherwise it takes several minutes to build the list. How likely is it that people will want to train on a public folder? Could there maybe be an option in the .ini or somewhere like "Present_Public_Folders: False", for those like me that don't and have very large public folders? =Tony Meyer From mhammond at skippinet.com.au Wed Jan 15 14:52:26 2003 From: mhammond at skippinet.com.au (Mark Hammond) Date: Tue Jan 14 22:53:14 2003 Subject: [Spambayes] Outlook plugin & bad folders In-Reply-To: <98B01D2717B9D411B38F0008C78409310EE3DAE1@its-xchg2.massey.ac.nz> Message-ID: <079b01c2bc49$853c47f0$530f8490@eden> [Tony] > [Mark Hammond] > > The only thing is that we are wrapping the recursive call to > > _BuildFolderTree in the exception handler. I would generally > > prefer to only > > catch the operation in error. Is it possible for you to > > include the full > > traceback without this patch applied? Then I will get it into CVS. > Here you go: Thanks! Please check the version I just checked in works OK for you. Thanks, Mark. From mhammond at skippinet.com.au Wed Jan 15 14:56:51 2003 From: mhammond at skippinet.com.au (Mark Hammond) Date: Tue Jan 14 22:57:13 2003 Subject: [Spambayes] Outlook plugin & bad folders In-Reply-To: <98B01D2717B9D411B38F0008C78409310EE3DAE1@its-xchg2.massey.ac.nz> Message-ID: <079c01c2bc4a$237ea7a0$530f8490@eden> Sorry, I missed this bit: [Tony] > Along a similiar(ish) line: > I actually have another line added to _BuildFolderTreeOutlook > that skips me past all the public folders (just a 'if > folder.name == "Public Folders" kind of thing), because > otherwise it takes several minutes to build the list. > > How likely is it that people will want to train on a public > folder? Could there maybe be an option in the .ini or > somewhere like "Present_Public_Folders: False", for those > like me that don't and have very large public folders? Check out the comments in this source file that start with: # Oh, lord help us. There is a MAPI version of the folder builder in that source file that will work *much* faster - but until I get my hands on an Exchange server, I can't really test it. If you look further in the source file for where BuildFolderTreeMAPI() is commented out, and you can manage to test it, I would be interested to know your experiences with the code - except that you may find the exact same exception we just plugged will be raised in this MAPI code - and a similar fix will also work. Mark. From tim.one at comcast.net Tue Jan 14 23:06:30 2003 From: tim.one at comcast.net (Tim Peters) Date: Tue Jan 14 23:07:08 2003 Subject: [Spambayes] Updating outlook to the new directory structure In-Reply-To: <078d01c2bc46$420ff5b0$530f8490@eden> Message-ID: [Mark Hammond] > FYI, after the recent source reorg, the Outlook addin seems to work fine, > except for 2 things: > > * You must remember to blow away your .pyc files, else things may go > screwey, and you won't notice the next point until later. > > * You need to do a full retrain of the database (as the module name > stored in the pickle has changed) Thanks for the advice! It's good advice, and it worked for me. > Apart from that, it all looks good. If we can just get rid of more .py > files from the root, life will be good This will be easy to achieve after everyone upgrades to Outlook2000 . From T.A.Meyer at massey.ac.nz Wed Jan 15 17:06:33 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Tue Jan 14 23:07:44 2003 Subject: [Spambayes] Outlook plugin & bad folders Message-ID: <98B01D2717B9D411B38F0008C78409310EE3DAE4@its-xchg2.massey.ac.nz> [Mark] > If you look further in the source file for where > BuildFolderTreeMAPI() is > commented out, and you can manage to test it, I would be > interested to know > your experiences with the code - except that you may find the > exact same > exception we just plugged will be raised in this MAPI code - > and a similar > fix will also work. When I was first trying to find the cause of the empty folder list, I looked at this, but had trouble (probably mostly because I was still trying to figure out Python). Is the switch as simple as changing "tree = BuildFolderTreeOutlook(self.mapi)" to "tree = BuildFolderTreeMAPI(self.mapi)"? I'll play around with this. Is there a thread or anything that lists the problems that you were experiencing? =Tony Meyer From T.A.Meyer at massey.ac.nz Wed Jan 15 17:48:50 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Tue Jan 14 23:49:31 2003 Subject: [Spambayes] Outlook plugin & bad folders Message-ID: <98B01D2717B9D411B38F0008C78409310EE3DAE6@its-xchg2.massey.ac.nz> > If you look further in the source file for where > BuildFolderTreeMAPI() is > commented out, and you can manage to test it, I would be > interested to know > your experiences with the code - except that you may find the > exact same > exception we just plugged will be raised in this MAPI code - > and a similar > fix will also work. This code worked perfectly (once I plugged in the same fix) for me, and took 32729ms instead of 88548.6ms. (Without the public folders it's 1136.05ms for MAPI and 2744.1ms for Outlook). What wasn't working with Exchange? =Tony Meyer From piersh at friskit.com Tue Jan 14 21:17:41 2003 From: piersh at friskit.com (Piers Haken) Date: Wed Jan 15 00:02:08 2003 Subject: [Spambayes] Outlook plugin & bad folders Message-ID: <9891913C5BFE87429D71E37F08210CB929753A@zeus.sfhq.friskit.com> The outlook version was added because the IDs that MAPI returns aren't compatible with the outlook IDs and you can't open a message on an exchange server with a MAPI ID. The MAPI tree-building case works fine on exchange, it's the message filtering code that breaks. BTW: Mark, you still didn't commit the CompareIDs fix I sent you a while back. The current version 'works' but '==' is not the recommended way to do the comparison... Piers. > -----Original Message----- > From: Meyer, Tony [mailto:T.A.Meyer@massey.ac.nz] > Sent: Tuesday, January 14, 2003 8:49 PM > To: 'Mark Hammond'; spambayes@python.org > Subject: RE: [Spambayes] Outlook plugin & bad folders > > > > If you look further in the source file for where > > BuildFolderTreeMAPI() is > > commented out, and you can manage to test it, I would be > > interested to know > > your experiences with the code - except that you may find the > > exact same > > exception we just plugged will be raised in this MAPI code - > > and a similar > > fix will also work. > > This code worked perfectly (once I plugged in the same fix) > for me, and took 32729ms instead of 88548.6ms. (Without the > public folders it's 1136.05ms for MAPI and 2744.1ms for Outlook). > > What wasn't working with Exchange? > > =Tony Meyer > > _______________________________________________ > Spambayes mailing list > Spambayes@python.org http://mail.python.org/mailman/listinfo/spambayes > From T.A.Meyer at massey.ac.nz Wed Jan 15 20:23:52 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Wed Jan 15 02:24:31 2003 Subject: [Spambayes] Outlook plugin & bad folders Message-ID: <98B01D2717B9D411B38F0008C78409310EE3DAE8@its-xchg2.massey.ac.nz> > The outlook version was added because the IDs that > MAPI returns aren't compatible with the outlook IDs > and you can't open a message on an exchange server > with a MAPI ID. Is this a definitive "can't", or a 'no-one has figured out how to yet' "can't"? > The MAPI tree-building case works fine on exchange, > it's the message filtering code that breaks. Ah yes, this doesn't work :) Couldn't the FolderSelection dialog only load in the folders it needs to display? i.e. at first it loads in the root folders, and then whenever OnTreeItemExpanding is called it adds in the necessary children? If this is practical/possible then if no-one else wants to do it, I could give it a go. (Although I have all of two days of Python knowledge). =Tony Meyer From piersh at friskit.com Wed Jan 15 02:18:36 2003 From: piersh at friskit.com (Piers Haken) Date: Wed Jan 15 05:02:45 2003 Subject: [Spambayes] Outlook plugin & bad folders Message-ID: <9891913C5BFE87429D71E37F08210CB929753B@zeus.sfhq.friskit.com> > -----Original Message----- > From: Meyer, Tony [mailto:T.A.Meyer@massey.ac.nz] > Sent: Tuesday, January 14, 2003 11:24 PM > To: spambayes@python.org > Subject: RE: [Spambayes] Outlook plugin & bad folders > > > > The outlook version was added because the IDs that > > MAPI returns aren't compatible with the outlook IDs > > and you can't open a message on an exchange server > > with a MAPI ID. > Is this a definitive "can't", or a 'no-one has figured out > how to yet' "can't"? I think it's more like a "you should be able to, and the docs say so, but it just doesn't work. Ugh..." > > The MAPI tree-building case works fine on exchange, > > it's the message filtering code that breaks. > Ah yes, this doesn't work :) > > Couldn't the FolderSelection dialog only load in the folders > it needs to display? i.e. at first it loads in the root > folders, and then whenever OnTreeItemExpanding is called it > adds in the necessary children? If this is > practical/possible then if no-one else wants to do it, I > could give it a go. (Although I have all of two days of > Python knowledge). Yes, this would definitely be a much better way of doing it, especially for people who have very large folder structures (eg, corporate public folders). You might want to keep the behavior where it expands enough to show the 'currently selected' folders. Piers. From mhammond at skippinet.com.au Wed Jan 15 22:52:09 2003 From: mhammond at skippinet.com.au (Mark Hammond) Date: Wed Jan 15 06:52:59 2003 Subject: [Spambayes] Outlook plugin & bad folders In-Reply-To: <9891913C5BFE87429D71E37F08210CB929753A@zeus.sfhq.friskit.com> Message-ID: <085201c2bc8c$88e7a140$530f8490@eden> [Piers Haken] > The outlook version was added because the IDs that MAPI > returns aren't compatible with the outlook IDs and you > can't open a message on an exchange server with a MAPI ID. > The MAPI tree-building case works fine on exchange, it's the > message filtering code that breaks. Can you remember the exact problem? Can you re-enable that code and see what it was? If necessary, we can add some extra diagnostic code to see where the EntryIDs differ, and look if there is any way we can normalize it. > BTW: Mark, you still didn't commit the CompareIDs fix I > sent you a while back. The current version 'works' but > '==' is not the recommended way to do the comparison... Yes, I haven't committed it because, as you said, it currently works . Fortunately and thankfully, you added a bug about it (even attaching a patch) so there is no way I can forget. As I am sure you can see from Tony's stats though, getting the MAPI version working is a much better option! I'm sure we can make the MAPI verion work - the MAPI extensions were developed against an exchange server. Back-when-Outlook-was-but-a-sparkle-in-Bill's-eye ly, Mark. -------------- next part -------------- A non-text attachment was scrubbed... Name: winmail.dat Type: application/ms-tnef Size: 3105 bytes Desc: not available Url : http://mail.python.org/pipermail/spambayes/attachments/20030115/8b28d7e6/winmail.bin From mhammond at skippinet.com.au Wed Jan 15 23:08:50 2003 From: mhammond at skippinet.com.au (Mark Hammond) Date: Wed Jan 15 07:09:41 2003 Subject: [Spambayes] Outlook plugin & bad folders In-Reply-To: <98B01D2717B9D411B38F0008C78409310EE3DAE8@its-xchg2.massey.ac.nz> Message-ID: <086401c2bc8e$de1b3c60$530f8490@eden> > Couldn't the FolderSelection dialog only load in the folders > it needs to display? i.e. at first it loads in the root > folders, and then whenever OnTreeItemExpanding is called it > adds in the necessary children? If this is > practical/possible then if no-one else wants to do it, I > could give it a go. (Although I have all of two days of > Python knowledge). :) go for it! pywin\tools\hierlist.py has an example of OnTreeItemExpanding. Only complication will be that the current code expands the tree to show the currently selected folders - this should be doable though. Still-wouldn't-mind-getting-that-MAPI-version-going ly, Mark. From skip at pobox.com Wed Jan 15 08:59:20 2003 From: skip at pobox.com (Skip Montanaro) Date: Wed Jan 15 09:59:29 2003 Subject: [Spambayes] new attempt at non-technical explanation on index.html of website In-Reply-To: <200301150344.h0F3ijF01416@localhost.localdomain> References: <200301150344.h0F3ijF01416@localhost.localdomain> Message-ID: <15909.30536.143312.323641@montanaro.dyndns.org> Anthony> I just checked in some text that attempts to explain, in a Anthony> mostly non-technical way, how spambayes works. It's the Anthony> "handwaving" bit on the index.html document on the website. Looks good. Anthony> suggestions for improvement accepted. I just checked out the website module and noticed that whatever editor you use doesn't wrap lines (most

...

chunks are one big honkin' line). That makes it a bit problematic to edit text in Emacs. If I wrap the lines will it hose your editing? Skip From skip at pobox.com Wed Jan 15 11:28:48 2003 From: skip at pobox.com (Skip Montanaro) Date: Wed Jan 15 12:28:57 2003 Subject: [Spambayes] spambayes fronting a mailing list? Message-ID: <15909.39504.598866.52741@montanaro.dyndns.org> I know Barry's working on spambayes integration with Mailman. Pretend I can't wait that long. ;-) Ignoring training issues (I can solve them without much problem), should I be able to just stick "hammie.py -f ..." in front of mailman in my aliases file and then just edit my "hold postings" regular expression? Am I missing something obvious? Skip From skip at pobox.com Wed Jan 15 14:43:17 2003 From: skip at pobox.com (Skip Montanaro) Date: Wed Jan 15 15:43:27 2003 Subject: [Spambayes] separating training stuff from pop3proxy - how hard? Message-ID: <15909.51173.814202.900365@montanaro.dyndns.org> I'm sure others have considered this already, but I began wondering today how hard it would be to separate pop3proxy into two pieces, the proxy stuff and the training/web stuff. I think having a separate training interface would be good because it could then be used by other spambayes tools. For example, just today I modified some Mailman-managed mailing lists to pump incoming messages through "hammie.py -f" before passing along to Mailman: #!/bin/bash BAYESHOME=/home/skip export BAYESCUSTOMIZE=$BAYESHOME/hammie.opt /usr/local/bin/hammie.py -f -d -p $BAYESHOME/hammie.db \ | /usr/local/bin/stripmime.pl \ | /home/mailman/mail/wrapper "$@" (Please don't flog me for using stripmime.pl. I'm sure there are better MIME strippers out there, but it works fine for my needs. ;-) For the time being I'm just using my own training database which is a superset of what goes to that particular mailing list. The "bright idea" I had today was that it would be great to simply modify the above pipeline to /usr/local/bin/hammie.py -f -d -p $BAYESHOME/hammie.db \ | tee /tmp/cedu-list-trainer \ | /usr/local/bin/stripmime.pl \ | /home/mailman/mail/wrapper "$@" and have the training stuff from pop3proxy waiting on a Unix named pipe named /tmp/cedu-list-trainer. At my leisure I could then visit the web interface and train any collected messages. The "tee" command could be replaced by a simple little tee-like program which disposed of the file in some other fashion, perhaps by using HTTP PUT to toss it at the training server. Any thoughts on this? Richie? Thx, Skip From tim at fourstonesExpressions.com Wed Jan 15 14:52:37 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Wed Jan 15 15:53:18 2003 Subject: [Spambayes] separating training stuff from pop3proxy - how hard? In-Reply-To: <15909.51173.814202.900365@montanaro.dyndns.org> Message-ID: 1/15/2003 2:43:17 PM, Skip Montanaro wrote: > >I'm sure others have considered this already, but I began wondering today >how hard it would be to separate pop3proxy into two pieces, the proxy stuff >and the training/web stuff. I think having a separate training interface >would be good because it could then be used by other spambayes tools. > >For example, just today I modified some Mailman-managed mailing lists to >pump incoming messages through "hammie.py -f" before passing along to >Mailman: > > #!/bin/bash > BAYESHOME=/home/skip > export BAYESCUSTOMIZE=$BAYESHOME/hammie.opt > > /usr/local/bin/hammie.py -f -d -p $BAYESHOME/hammie.db \ > | /usr/local/bin/stripmime.pl \ > | /home/mailman/mail/wrapper "$@" > >(Please don't flog me for using stripmime.pl. I'm sure there are better >MIME strippers out there, but it works fine for my needs. ;-) > >For the time being I'm just using my own training database which is a >superset of what goes to that particular mailing list. > >The "bright idea" I had today was that it would be great to simply modify >the above pipeline to > > /usr/local/bin/hammie.py -f -d -p $BAYESHOME/hammie.db \ > | tee /tmp/cedu-list-trainer \ > | /usr/local/bin/stripmime.pl \ > | /home/mailman/mail/wrapper "$@" > >and have the training stuff from pop3proxy waiting on a Unix named pipe >named /tmp/cedu-list-trainer. At my leisure I could then visit the web >interface and train any collected messages. > >The "tee" command could be replaced by a simple little tee-like program >which disposed of the file in some other fashion, perhaps by using HTTP PUT >to toss it at the training server. > >Any thoughts on this? Richie? The training stuff used by the pop3proxy is already 'stripped out' into Corpus.py and FileCorpus.py. These modules probably don't do exactly what you need right now, but we've been considering rewriting them anyway, to handle more than just file system artifacts for messages. You might take a look at those modules. I have some ideas about rewriting them, Mark Hammond has levied some requirements as well... > >Thx, > >Skip > > >_______________________________________________ >Spambayes mailing list >Spambayes@python.org >http://mail.python.org/mailman/listinfo/spambayes > > c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org From neale at woozle.org Wed Jan 15 13:08:39 2003 From: neale at woozle.org (Neale Pickett) Date: Wed Jan 15 16:08:48 2003 Subject: [Spambayes] spambayes fronting a mailing list? In-Reply-To: <15909.39504.598866.52741@montanaro.dyndns.org> (Skip Montanaro's message of "Wed, 15 Jan 2003 11:28:48 -0600") References: <15909.39504.598866.52741@montanaro.dyndns.org> Message-ID: Skip Montanaro writes: > I know Barry's working on spambayes integration with Mailman. Pretend I > can't wait that long. ;-) Ignoring training issues (I can solve them without > much problem), should I be able to just stick "hammie.py -f ..." in front of > mailman in my aliases file and then just edit my "hold postings" regular > expression? Am I missing something obvious? That seems like it'd work, but please use hammiefilter. Running hammie -f is deprecated (meaning that as soon as I get a round tuit, hammie will no longer be executable). Neale From neale at woozle.org Wed Jan 15 13:12:25 2003 From: neale at woozle.org (Neale Pickett) Date: Wed Jan 15 16:12:29 2003 Subject: [Spambayes] separating training stuff from pop3proxy - how hard? In-Reply-To: <15909.51173.814202.900365@montanaro.dyndns.org> (Skip Montanaro's message of "Wed, 15 Jan 2003 14:43:17 -0600") References: <15909.51173.814202.900365@montanaro.dyndns.org> Message-ID: Skip Montanaro writes: > /usr/local/bin/hammie.py -f -d -p $BAYESHOME/hammie.db \ > | /usr/local/bin/stripmime.pl \ > | /home/mailman/mail/wrapper "$@" > > (Please don't flog me for using stripmime.pl. I'm sure there are better > MIME strippers out there, but it works fine for my needs. ;-) If you ever need an optimization, it occurs to me that hammie.py will have already pulled the message apart into MIME parts, so you should be able to start with hammiefilter.py and write a dual spamcheck/MIME-strip program. Someone else might want this too--for example, SpamAssassin munges MIME for tagged spam, presumably to protect the "click first, ask questions later" crowd :) Neale From skip at pobox.com Wed Jan 15 15:23:53 2003 From: skip at pobox.com (Skip Montanaro) Date: Wed Jan 15 16:24:08 2003 Subject: [Spambayes] spambayes fronting a mailing list? In-Reply-To: References: <15909.39504.598866.52741@montanaro.dyndns.org> Message-ID: <15909.53609.187510.588099@montanaro.dyndns.org> Neale> That seems like it'd work, but please use hammiefilter. Running Neale> hammie -f is deprecated (meaning that as soon as I get a round Neale> tuit, hammie will no longer be executable). Hmmm...: % type hammiefilter.py hammiefilter.py is /Users/skip/local/bin/hammiefilter.py % hammiefilter.py --help Traceback (most recent call last): File "/Users/skip/local/bin/hammiefilter.py", line 43, in ? from spambayes import hammie, Options, StringIO ImportError: cannot import name StringIO Looks like a transcription error in the grand directory shuffling. I just checked in a fix. I suspect nobody who uses hammiefilter.py has cvs up'd recently. Skip From neale at woozle.org Wed Jan 15 13:28:12 2003 From: neale at woozle.org (Neale Pickett) Date: Wed Jan 15 16:28:16 2003 Subject: [Spambayes] spambayes fronting a mailing list? In-Reply-To: <15909.53609.187510.588099@montanaro.dyndns.org> (Skip Montanaro's message of "Wed, 15 Jan 2003 15:23:53 -0600") References: <15909.39504.598866.52741@montanaro.dyndns.org> <15909.53609.187510.588099@montanaro.dyndns.org> Message-ID: Skip Montanaro writes: > Looks like a transcription error in the grand directory shuffling. I just > checked in a fix. I suspect nobody who uses hammiefilter.py has cvs up'd > recently. Yup. But it turns out we don't even need to import StringIO, so I just checked in its removal :) Thanks! Neale From skip at pobox.com Wed Jan 15 16:14:25 2003 From: skip at pobox.com (Skip Montanaro) Date: Wed Jan 15 17:14:35 2003 Subject: [Spambayes] Something's still missing from hammiefilter Message-ID: <15909.56641.568386.266344@montanaro.dyndns.org> Neale encouraged me to use "hammiefilter.py" instead of "hammmie.py -f", but it doesn't support enough command line args. I currently call hammie.py from procmail like so: HAMMIE=$HOME/local/bin/hammie.py ... :0 fw:hamlock | $HAMMIE -f -d -p $HOME/hammie.db The -d (use dbm) and -p (specify pickle or database file) flags are missing. I'd really prefer these be available on the command line as well as via the options file. Is there a reason not to expose them on the command line? Skip From skip at pobox.com Wed Jan 15 19:51:04 2003 From: skip at pobox.com (Skip Montanaro) Date: Wed Jan 15 20:51:10 2003 Subject: [Spambayes] pop3proxy.UserInterface.onSave - self.shutdown? Message-ID: <15910.4104.787891.400893@montanaro.dyndns.org> Pychecker complains about the call to self.shutdown(2) on line 1441 of pop3proxy.py. It should probably be self.socket.shutdown(2), but I'll let someone else who knows the code better verify that. Skip From anthony at interlink.com.au Thu Jan 16 12:55:17 2003 From: anthony at interlink.com.au (Anthony Baxter) Date: Wed Jan 15 20:56:48 2003 Subject: [Spambayes] new attempt at non-technical explanation on index.html of website In-Reply-To: <15909.30536.143312.323641@montanaro.dyndns.org> Message-ID: <200301160155.h0G1tIb02948@localhost.localdomain> >>> Skip Montanaro wrote > I just checked out the website module and noticed that whatever editor you > use doesn't wrap lines (most

...

chunks are one big honkin' line). > That makes it a bit problematic to edit text in Emacs. If I wrap > the lines will it hose your editing? Nope. I'm just lazy with vi - much of the verbiage is done in large slabs of typing, and I hate autowrapping :) Feel free to foldspindlemutilate. From skip at pobox.com Wed Jan 15 19:10:47 2003 From: skip at pobox.com (Skip Montanaro) Date: Wed Jan 15 21:27:50 2003 Subject: [Spambayes] separating training stuff from pop3proxy - how hard? In-Reply-To: References: <15909.51173.814202.900365@montanaro.dyndns.org> Message-ID: <15910.1687.290158.515305@montanaro.dyndns.org> >> I'm sure others have considered this already, but I began wondering >> today how hard it would be to separate pop3proxy into two pieces, the >> proxy stuff and the training/web stuff. I think having a separate >> training interface would be good because it could then be used by >> other spambayes tools. Tim> The training stuff used by the pop3proxy is already 'stripped out' Tim> into Corpus.py and FileCorpus.py. These modules probably don't do Tim> exactly what you need right now, but we've been considering Tim> rewriting them anyway, to handle more than just file system Tim> artifacts for messages. Thanks, I'll take a look. I'm interested in separating the POP stuff from the training/web stuff. Maybe I could simply delete the POP stuff and see what's left. ;-) Skip From T.A.Meyer at massey.ac.nz Thu Jan 16 18:37:39 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Thu Jan 16 00:38:31 2003 Subject: [Spambayes] Outlook plugin & bad folders Message-ID: <98B01D2717B9D411B38F0008C78409310EE3DAEC@its-xchg2.massey.ac.nz> [Tony] > > Couldn't the FolderSelection dialog only load in the folders > > it needs to display? i.e. at first it loads in the root > > folders, and then whenever OnTreeItemExpanding is called it > > adds in the necessary children? If this is > > practical/possible then if no-one else wants to do it, I > > could give it a go. (Although I have all of two days of > > Python knowledge). [Mark] > :) go for it! pywin\tools\hierlist.py has an example of > OnTreeItemExpanding. Only complication will be that the current code > expands the tree to show the currently selected folders - > this should be doable though. It's done. Well, it works on my system, anyway :) Including the expanding the tree to show the selected items. So what do I do with my code now? > Still-wouldn't-mind-getting-that-MAPI-version-going ly, I *think* that this should all still work (with a bit of tweaking) with the MAPI version, which would make things even faster :) Still, this is fast enough for me - the dialog takes about 0.5->1s to appear, rather than the 30s (MAPI) or 60s (Outlook) that it did before. Of course, anyone with a really large, really flat folder structure will still have to wait, but they should just be more organised :) =Tony Meyer From anthony at interlink.com.au Thu Jan 16 16:38:25 2003 From: anthony at interlink.com.au (Anthony Baxter) Date: Thu Jan 16 00:40:08 2003 Subject: [Spambayes] credit/blame Message-ID: <200301160538.h0G5cQS14083@localhost.localdomain> Here's what's currently on the index.html page for the 'credits/blame' section. Have I missed anyone? It's possible, as I'm feeling amazingly fuzzy-brained today. It's a chunk I wrote some time ago, so it's probably missing people... Most of the heavy lifting on this project was done by Tim Peters, with the cast of spambayes obsessive-compulsives providing ideas, heckling, and testing. Gary Robinson and Rob Hooft contributed valuable help on the maths behind it all. Mark Hammond amazed the world with the Outlook2000 plugin, and Rich Hindle, Neale Pickett, Tim Stone worked on the end-user applications. If I have missed someone, or misrepresented their work, my apologies - please drop me an email... or should we simply have an 'Acknowledgments' file in the distribution? From T.A.Meyer at massey.ac.nz Thu Jan 16 18:43:21 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Thu Jan 16 00:44:10 2003 Subject: [Spambayes] Outlook plugin & bad folders Message-ID: <98B01D2717B9D411B38F0008C78409310EE3DAED@its-xchg2.massey.ac.nz> > > > The outlook version was added because the IDs that > > > MAPI returns aren't compatible with the outlook IDs > > > and you can't open a message on an exchange server > > > with a MAPI ID. > > Is this a definitive "can't", or a 'no-one has figured out > > how to yet' "can't"? > I think it's more like a "you should be able to, and the docs say so, but it just doesn't work. Ugh..." :) Does this mean that it's not worth bothering to try and fix it? In any case, I did the mod that changes the list to only build on demand. This is faster, although not what I would call fast. (But then, my system isn't that fast, and I'm shackled up to a large public folder through work). Would implementing the _BuildFolderTreeOutlook() function in C/C++ make a significant difference? (I guess what I'm asking is whether it's just Outlook itself that is causing the delay). =Tony Meyer From barry at python.org Thu Jan 16 00:51:57 2003 From: barry at python.org (Barry A. Warsaw) Date: Thu Jan 16 00:52:25 2003 Subject: [Spambayes] spambayes fronting a mailing list? References: <15909.39504.598866.52741@montanaro.dyndns.org> Message-ID: <15910.18557.535408.669103@gargle.gargle.HOWL> >>>>> "SM" == Skip Montanaro writes: SM> I know Barry's working on spambayes integration with Mailman. SM> Pretend I can't wait that long. ;-) Ignoring training issues SM> (I can solve them without much problem), should I be able to SM> just stick "hammie.py -f ..." in front of mailman in my SM> aliases file and then just edit my "hold postings" regular SM> expression? Am I missing something obvious? This ought to work fairly well, I think, modulo the training issue. My idea was to not train the list at all, before turning on spambayes. So the first batch of messages will all get held as unsure, and you'd use the admindb page to accept and reject messages. Accept messages would train as ham and rejected messages would get trained as spam. The u/i for these options is undecided -- maybe you have an additional "train as..." radio button. I don't think this matters much right now. So as your list warms up, you'll be training the system. I wonder how long it'll take before spambayes gets pretty good at detecting what's appropriate and what's not for your list? -Barry From barry at python.org Thu Jan 16 00:52:52 2003 From: barry at python.org (Barry A. Warsaw) Date: Thu Jan 16 00:53:20 2003 Subject: [Spambayes] separating training stuff from pop3proxy - how hard? References: <15909.51173.814202.900365@montanaro.dyndns.org> Message-ID: <15910.18612.748855.750857@gargle.gargle.HOWL> >>>>> "SM" == Skip Montanaro writes: SM> (Please don't flog me for using stripmime.pl. I'm sure there SM> are better MIME strippers out there, but it works fine for my SM> needs. ;-) Of course, you know that Mailman 2.1 has this built in, right? -Barry From anthony at interlink.com.au Thu Jan 16 17:11:59 2003 From: anthony at interlink.com.au (Anthony Baxter) Date: Thu Jan 16 01:13:43 2003 Subject: [Spambayes] spambayes fronting a mailing list? In-Reply-To: <15910.18557.535408.669103@gargle.gargle.HOWL> Message-ID: <200301160612.h0G6C0x14523@localhost.localdomain> >>> Barry A. Warsaw wrote > So as your list warms up, you'll be training the system. I wonder how > long it'll take before spambayes gets pretty good at detecting what's > appropriate and what's not for your list? This seems like a plan - so long as the UI doesn't suck too hard :) Previous experiments have shown that it learns _really_ quickly, if the subject matter's really focussed... something like 20 messages gave a remarkably good result, from memory. -- Anthony Baxter It's never too late to have a happy childhood. From ducky at webfoot.com Wed Jan 15 22:31:29 2003 From: ducky at webfoot.com (Kaitlin Duck Sherwood) Date: Thu Jan 16 01:28:51 2003 Subject: [Spambayes] Two Stage Plan In-Reply-To: References: Message-ID: (Sorry I'm late to this particular discussion on using postage...) I'd like to suggest + making the postage stamp computationally VERY expensive for the client, and + assume that users look at postage as only one factor in judging spaminess. For example, hypothetically: + For me, anybody on my whitelist gets their messages through without postage. + For Frieda, any message without postage gets through if it's got a SpamAssassin score of less than 3. + For Paul, any message that his Bayesian algorithm rates as <20% likely to be spam gets through without postage. + For Chantelle, any message without postage gets a reverse-Turing-test challenge. If postage is only one factor, then it can be useful before "everybody" adopts it. If postage is only one factor, then listbots can insist on one postage unit for messages that the listbot receives, but the listbot can then send out out messages (to the teeming hordes on the list) without postage I want postage to be computationally *very* expensive. Like five or ten minutes on a (currently) high-end desktop. I want strangers to have to spend some time -- not just money -- to be sure that I'll read their messages. Shoot, I don't even care if there is no money involved at all, "only" time. I also want the reverse algorithm -- where I check to see if their token is valid -- to be very fast. So are there any one-way algorithms that would involve my email address and some other piece of changing data, like seconds since Jan 1, 1970? Or perhaps make and use a Web service that generates and posts random time-stamped numbers? (A web service with random, time-stamped numbers could also provide for essentially constant difficulty as processors get higher-powered, e.g. the random numbers keep getting bigger.) BTW, anyone who is going to the spam conference, look for me in the colored (probably purple) beret! From mhammond at skippinet.com.au Thu Jan 16 17:30:25 2003 From: mhammond at skippinet.com.au (Mark Hammond) Date: Thu Jan 16 01:30:45 2003 Subject: [Spambayes] Outlook plugin & bad folders In-Reply-To: <98B01D2717B9D411B38F0008C78409310EE3DAEC@its-xchg2.massey.ac.nz> Message-ID: <0b0001c2bd28$c31267a0$530f8490@eden> > It's done. Well, it works on my system, anyway :) Including > the expanding the tree to show the selected items. So what > do I do with my code now? Mail it to me :) Mark. From mhammond at skippinet.com.au Thu Jan 16 17:33:51 2003 From: mhammond at skippinet.com.au (Mark Hammond) Date: Thu Jan 16 01:34:47 2003 Subject: [Spambayes] Outlook plugin & bad folders In-Reply-To: <98B01D2717B9D411B38F0008C78409310EE3DAED@its-xchg2.massey.ac.nz> Message-ID: <0b0101c2bd29$3c81e610$530f8490@eden> > Would implementing the _BuildFolderTreeOutlook() function in > C/C++ make a significant difference? (I guess what I'm > asking is whether it's just Outlook itself that is causing the delay). Yes, Outlook itself is the problem. MAPI is the high-performance API, and as you can see, Python does indeed get high-performance using it. I'm still yet to know any specific details on what problem we have when using the MAPI version. Mark. From piersh at friskit.com Wed Jan 15 23:34:54 2003 From: piersh at friskit.com (Piers Haken) Date: Thu Jan 16 02:18:46 2003 Subject: [Spambayes] Outlook plugin & bad folders Message-ID: <9891913C5BFE87429D71E37F08210CB929753E@zeus.sfhq.friskit.com> The problem is that the GetFolderFromID call in outlook's object model (called from MAPIMsgStoreFolder.GetOutlookItem) does not accept MAPI folder IDs when those folders are on an exchange server. It probably has something to do with the fact that when you're using PSTs the object model and the underlying MAPI store are the same thing, but when you're using exchange the store is a separate component. In theory they should be able to keep the oulook IDs and the exchange MAPI IDs consistent, but in practice... Piers. > -----Original Message----- > From: Mark Hammond [mailto:mhammond@skippinet.com.au] > Sent: Wednesday, January 15, 2003 10:34 PM > To: 'Meyer, Tony'; Piers Haken; spambayes@python.org > Subject: RE: [Spambayes] Outlook plugin & bad folders > > > > Would implementing the _BuildFolderTreeOutlook() function in C/C++ > > make a significant difference? (I guess what I'm asking is whether > > it's just Outlook itself that is causing the delay). > > Yes, Outlook itself is the problem. MAPI is the > high-performance API, and as you can see, Python does indeed > get high-performance using it. > > I'm still yet to know any specific details on what problem we > have when using the MAPI version. > > Mark. > > From vanhorn at whidbey.com Thu Jan 16 00:28:24 2003 From: vanhorn at whidbey.com (G. Armour Van Horn) Date: Thu Jan 16 03:28:26 2003 Subject: [Spambayes] spambayes fronting a mailing list? References: <15909.39504.598866.52741@montanaro.dyndns.org> <15910.18557.535408.669103@gargle.gargle.HOWL> Message-ID: <3E266D28.8591603A@whidbey.com> Barry, If your going into the administrative interface anyway ... When I used to get a message from Mailman about a message being held for my approval, the first page I hit told me why, whether it was too large, non-member post, whatever. Now I have to jump to a second page to learn that for each sender (and it's normally only one message per sender). Since you are putting up the neat summary windows with all the options on that first page, could we please have the reason for the hold in there? Pretty please? As to how long it would take to make a difference, based on what folks have said here I suspect that any list with ten messages a day would be over 99% accurate by the end of a week. Since half my Mailman moderation is probably spam these days, I'm looking forward to it. Van "Barry A. Warsaw" wrote: > >>>>> "SM" == Skip Montanaro writes: > > SM> I know Barry's working on spambayes integration with Mailman. > SM> Pretend I can't wait that long. ;-) Ignoring training issues > SM> (I can solve them without much problem), should I be able to > SM> just stick "hammie.py -f ..." in front of mailman in my > SM> aliases file and then just edit my "hold postings" regular > SM> expression? Am I missing something obvious? > > This ought to work fairly well, I think, modulo the training issue. > My idea was to not train the list at all, before turning on > spambayes. So the first batch of messages will all get held as > unsure, and you'd use the admindb page to accept and reject messages. > Accept messages would train as ham and rejected messages would get > trained as spam. > > The u/i for these options is undecided -- maybe you have an additional > "train as..." radio button. I don't think this matters much right > now. > > So as your list warms up, you'll be training the system. I wonder how > long it'll take before spambayes gets pretty good at detecting what's > appropriate and what's not for your list? > > -Barry > > _______________________________________________ > Spambayes mailing list > Spambayes@python.org > http://mail.python.org/mailman/listinfo/spambayes -- ---------------------------------------------------------- Sign up now for Quotes of the Day, a handful of quotations on a theme delivered every morning. Enlightenment! Daily, for free! mailto:twisted@whidbey.com?subject=Subscribe_QOTD For web hosting and maintenance, visit Van's home page: http://www.domainvanhorn.com/van/ ---------------------------------------------------------- From richie at entrian.com Thu Jan 16 08:54:09 2003 From: richie at entrian.com (richie@entrian.com) Date: Thu Jan 16 03:54:18 2003 Subject: [Spambayes] separating training stuff from pop3proxy - how hard? In-Reply-To: <15910.1687.290158.515305@montanaro.dyndns.org> Message-ID: [Skip] > I'm interested in separating the POP stuff from > the training/web stuff. Maybe I could simply delete the POP stuff and see > what's left. ;-) I think you'd be surprised at how well that would work. 8-) I've already done half of this job - I need to give it some more testing, but it's pretty much there. The POP3 proxy and the web UI are already fairly independent - they don't communicate directly, but instead refer to a common set of FileCorpuses. The new version will enable me to pull them apart completely, into three separate files - a core server component, the POP proxy and the web interface. (Though I'll commit under the current all-in-pop3proxy.py arrangement first to make it easier to track changes through CVS.) Soon you'll be able to run a "Spambayes Server" that provides either the POP3 proxy or the web interface or both, with no dependencies. The work I'll be committing this week is a step towards that. You should be able to add a listen-for-incoming-messages-by-HTTP-or-whatever component very easily - it will plug into the core server and poke messages into the FileCorpuses in the same way that the POP3 proxy does now. -- Richie Hindle richie@entrian.com From rob at hooft.net Thu Jan 16 13:15:52 2003 From: rob at hooft.net (Rob W. W. Hooft) Date: Thu Jan 16 07:15:57 2003 Subject: [Spambayes] spambayes fronting a mailing list? References: <200301160612.h0G6C0x14523@localhost.localdomain> Message-ID: <3E26A278.3080302@hooft.net> Anthony Baxter wrote: >>>>Barry A. Warsaw wrote >>> >>So as your list warms up, you'll be training the system. I wonder how >>long it'll take before spambayes gets pretty good at detecting what's >>appropriate and what's not for your list? > > > This seems like a plan - so long as the UI doesn't suck too hard :) > > Previous experiments have shown that it learns _really_ quickly, > if the subject matter's really focussed... something like 20 messages > gave a remarkably good result, from memory. Doesn't it take time before the first spam arrives on a brand new mailinglist? Spambayes' results are going to be real lousy if it is trained on 200 ham and 0 spam messages.... Rob -- Rob W.W. Hooft || rob@hooft.net || http://www.hooft.net/people/rob/ From skip at pobox.com Thu Jan 16 06:31:43 2003 From: skip at pobox.com (Skip Montanaro) Date: Thu Jan 16 07:31:47 2003 Subject: [Spambayes] spambayes fronting a mailing list? In-Reply-To: <15910.18557.535408.669103@gargle.gargle.HOWL> References: <15909.39504.598866.52741@montanaro.dyndns.org> <15910.18557.535408.669103@gargle.gargle.HOWL> Message-ID: <15910.42543.629381.696105@montanaro.dyndns.org> BAW> This ought to work fairly well, I think, modulo the training issue. BAW> My idea was to not train the list at all, before turning on BAW> spambayes. So the first batch of messages will all get held as BAW> unsure, and you'd use the admindb page to accept and reject BAW> messages. Accept messages would train as ham and rejected messages BAW> would get trained as spam. In my case I sidestepped training altogether because the list's content is a subset of the stuff I'm interested in anyway. Most of the "spam" messages encountered by the list at this point are really of the virus/worm variety, and since it's set up for members only posting, little, if any garbage actually gets through to the list, even without using spambayes. BAW> The u/i for these options is undecided -- maybe you have an BAW> additional "train as..." radio button. I don't think this matters BAW> much right now. One reason I'm interested in separating pop3proxy into two functions ( POP retrieval/classifying and training/web UI) is that the training/web component should be useful for other spambayes users. Right now in my current environment, training is clunky enough that I only train on unsures and mistakes. While that works okay because my starting corpus was so large (around 20,000 messages) the indications from people who've experimented with that sort of training is that the quality of classification does degrade over time. Last night I ripped out the POP stuff from pop3proxy, renamed the result proxytrainer and added one extra method, onUpload. Then I wrote a simple proxytee.py script which passes stdin to stdout and uploads the message it received to http://localhost:8880/upload as a file upload (in theory, allowing upload of large mbox files). The mbox upload doesn't seem to be quite working yet and there's still that pesky infinite loop in onReview, but I have hope it will eventually work pretty well. At that point, anyone should be able to use it as a training interface. All they will need is a tee-type hook they can insert into their mail transport somewhere. A bit further down the road, I will probably dump the asyncore stuff in favor of something based on SimpleHTTPServer just to reduce the number of lines of code. Without the POP stuff going on there's no great need for the channel multiplexing. Even without threading, the amount of work the server would have to do per click on the user interface is minimal. BAW> So as your list warms up, you'll be training the system. I wonder BAW> how long it'll take before spambayes gets pretty good at detecting BAW> what's appropriate and what's not for your list? Like I indicated, I gave it a head start. ;-) Skip From skip at pobox.com Thu Jan 16 06:36:41 2003 From: skip at pobox.com (Skip Montanaro) Date: Thu Jan 16 07:36:44 2003 Subject: [Spambayes] separating training stuff from pop3proxy - how hard? In-Reply-To: <15910.18612.748855.750857@gargle.gargle.HOWL> References: <15909.51173.814202.900365@montanaro.dyndns.org> <15910.18612.748855.750857@gargle.gargle.HOWL> Message-ID: <15910.42841.803998.192192@montanaro.dyndns.org> SM> (Please don't flog me for using stripmime.pl. I'm sure there are SM> better MIME strippers out there, but it works fine for my needs. ;-) BAW> Of course, you know that Mailman 2.1 has this built in, right? No, actually, I didn't. I haven't upgraded yet. Thanks for such a gentle flog... Skip From skip at pobox.com Thu Jan 16 06:51:47 2003 From: skip at pobox.com (Skip Montanaro) Date: Thu Jan 16 07:51:51 2003 Subject: [Spambayes] spambayes fronting a mailing list? In-Reply-To: <3E26A278.3080302@hooft.net> References: <200301160612.h0G6C0x14523@localhost.localdomain> <3E26A278.3080302@hooft.net> Message-ID: <15910.43747.285523.378123@montanaro.dyndns.org> Rob> Doesn't it take time before the first spam arrives on a brand new Rob> mailinglist? Spambayes' results are going to be real lousy if it is Rob> trained on 200 ham and 0 spam messages.... A couple of things come to mind: 1. Don't enable spambayes until you start having trouble 2. With a proxytrainer/proxytee setup as I described in a previous message you can seed it with a handful of spam you have laying about. Just set your options to ignore stuff like sender and to while training on those messages. 3. Send your mailing list address directly to the spammers. They'll find it soon enough anyway. ;-) It's-not-like-spam-is-hard-to-find-ly, y'rs, Skip From tim at fourstonesExpressions.com Thu Jan 16 07:13:09 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Thu Jan 16 08:13:55 2003 Subject: [Spambayes] spambayes fronting a mailing list? In-Reply-To: <200301160612.h0G6C0x14523@localhost.localdomain> Message-ID: 1/16/2003 12:11:59 AM, Anthony Baxter wrote: > >>>> Barry A. Warsaw wrote >> So as your list warms up, you'll be training the system. I wonder how >> long it'll take before spambayes gets pretty good at detecting what's >> appropriate and what's not for your list? > >This seems like a plan - so long as the UI doesn't suck too hard :) > >Previous experiments have shown that it learns _really_ quickly, >if the subject matter's really focussed... something like 20 messages >gave a remarkably good result, from memory. This is exactly what I did, and it started producing results immediately. After I had trained on only a few (maybe 5) or so spam, it began classifying nearly all spam correctly. It didn't classify ham correctly very often at that point, but I was ok with that. Unsures and ham are much the same to me... - TimS > >-- >Anthony Baxter >It's never too late to have a happy childhood. > > >_______________________________________________ >Spambayes mailing list >Spambayes@python.org >http://mail.python.org/mailman/listinfo/spambayes > > c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org From esj at harvee.billerica.ma.us Thu Jan 16 08:02:59 2003 From: esj at harvee.billerica.ma.us (Eric S. Johansson) Date: Thu Jan 16 08:14:01 2003 Subject: [Spambayes] Two Stage Plan In-Reply-To: References: Message-ID: <3E26AD83.9070500@harvee.billerica.ma.us> Kaitlin Duck Sherwood wrote: > (Sorry I'm late to this particular discussion on using postage...) > > I'd like to suggest > + making the postage stamp computationally VERY expensive for the > client, and > + assume that users look at postage as only one factor in judging > spaminess. actually, there is a project for this type of system called camram. I have a working proof of concept including handling postage due notices etc. I'm expanding it to include a Bayesian style filter as a discriminator in case a message fails the stamp or white list tests. If you'd like a white paper on this, either way a few days for it to be published (theoretically) in the proceedings of the upcoming antispam conference, or you can ask me nice and I'll send you the document (unfortunately in Microsoft Word format). > If postage is only one factor, then it can be useful before "everybody" > adopts it. If postage is only one factor, then listbots can insist on > one postage unit for messages that the listbot receives, but the listbot > can then send out out messages (to the teeming hordes on the list) > without postage not exactly true. If you use postage due notices with the ability to generate postage stamps via a Java applet, you can get some benefit without 100 percent adoption. We operate on the principal that "strangers cost, friends fly free" which means that I only expect stamps from people I don't know. A mailing list is someone I know and therefore I don't expect any stamps from them. A mailing list could ask for stamps from everyone but it would make more sense to use the postage due mechanism only for nonsubscribers. Then you can use the same technique I do and camram which is that anyone you can't deliver a postage due notice to is spam and therefore the message can be safely discarded. Yes, I know it's not strictly true but if you pay attention to why message delivery fails, it's effectively true. white list on the other hand our home other topic and I believe should be based on name but on public key. > I want postage to be computationally *very* expensive. Like five or ten > minutes on a (currently) high-end desktop. I want strangers to have to > spend some time -- not just money -- to be sure that I'll read their > messages. Shoot, I don't even care if there is no money involved at > all, "only" time. > > I also want the reverse algorithm -- where I check to see if their token > is valid -- to be very fast. google for Adam Back, hashcash. Also look for "proof of work" puzzles. unfortunately, proof of work puzzles suffer from Moore's Law inflation. I've been given a lead that says a proof of work puzzled exercises the memory bus will be less susceptible to Moore's law inflation and I'm talking with a cryptographer about a memory intensive POW puzzle. By the way, if you do the math, a three second computation would slow down a high-powered spammer 140 times or, put another way they would need 140 machines generating stamps constantly in order to keep up the same data rate through a T1. Computation length is very tricky because you don't want it to be so long that you discourage low-end machine users while at the same time, not giving high-end machine users a significant advantage. although, this problem can be reduced by pushing the stamp calculation and mail delivery into the background. > So are there any one-way algorithms that would involve my email address > and some other piece of changing data, like seconds since Jan 1, 1970? > Or perhaps make and use a Web service that generates and posts random > time-stamped numbers? (A web service with random, time-stamped numbers > could also provide for essentially constant difficulty as processors get > higher-powered, e.g. the random numbers keep getting bigger.) centralized services fail from a reliability perspective. They can also fail if a service can be corrupted or abused. > BTW, anyone who is going to the spam conference, look for me in the > colored (probably purple) beret! as will I. (be there I mean. sans beret, may be bright yellow terry cloth hat at times) ---eric From rob at hooft.net Thu Jan 16 14:34:35 2003 From: rob at hooft.net (Rob W. W. Hooft) Date: Thu Jan 16 08:35:58 2003 Subject: [Spambayes] spambayes fronting a mailing list? References: <200301160612.h0G6C0x14523@localhost.localdomain> <3E26A278.3080302@hooft.net> <15910.43747.285523.378123@montanaro.dyndns.org> Message-ID: <3E26B4EB.5020100@hooft.net> Skip Montanaro wrote: > Rob> Doesn't it take time before the first spam arrives on a brand new > Rob> mailinglist? Spambayes' results are going to be real lousy if it is > Rob> trained on 200 ham and 0 spam messages.... > > A couple of things come to mind: > > 1. Don't enable spambayes until you start having trouble It is going to be too late.... > 2. With a proxytrainer/proxytee setup as I described in a previous > message you can seed it with a handful of spam you have laying about. > Just set your options to ignore stuff like sender and to while > training on those messages. This sounds reasonable, but this can also be implemented as a "preloaded database" that comes with spambayes. This is something many people have already asked for. > 3. Send your mailing list address directly to the spammers. They'll > find it soon enough anyway. ;-) www.spamsubmit.com? "Now submit your address to 100s of spam engines without all the hassle! Ever tried to get all spam messages first? Manually typed your address in at many spammer sites? Now, at spamsubmit.com we submit your address to 100s of spam engines without any efforts from you! Introductory price for this service is $99 for this month only!" Rob -- Rob W.W. Hooft || rob@hooft.net || http://www.hooft.net/people/rob/ From barry at python.org Thu Jan 16 09:35:53 2003 From: barry at python.org (Barry A. Warsaw) Date: Thu Jan 16 09:37:47 2003 Subject: [Spambayes] spambayes fronting a mailing list? References: <15910.18557.535408.669103@gargle.gargle.HOWL> <200301160612.h0G6C0x14523@localhost.localdomain> Message-ID: <15910.49993.523948.576657@gargle.gargle.HOWL> >>>>> "AB" == Anthony Baxter writes: >> Barry A. Warsaw wrote >> So as your list warms up, you'll be training the system. I >> wonder how long it'll take before spambayes gets pretty good at >> detecting what's appropriate and what's not for your list? AB> This seems like a plan - so long as the UI doesn't suck too AB> hard :) The u/i already sucks so I doubt it could suck any worse. :) AB> Previous experiments have shown that it learns _really_ AB> quickly, if the subject matter's really focussed... something AB> like 20 messages gave a remarkably good result, from memory. That's what I'm counting on! -Barry From barry at python.org Thu Jan 16 09:43:49 2003 From: barry at python.org (Barry A. Warsaw) Date: Thu Jan 16 09:44:23 2003 Subject: [Spambayes] spambayes fronting a mailing list? References: <15909.39504.598866.52741@montanaro.dyndns.org> <15910.18557.535408.669103@gargle.gargle.HOWL> <3E266D28.8591603A@whidbey.com> Message-ID: <15910.50469.559467.710145@gargle.gargle.HOWL> >>>>> "GAVH" == G Armour Van Horn writes: GAVH> If your going into the administrative interface anyway ... GAVH> When I used to get a message from Mailman about a message GAVH> being held for my approval, the first page I hit told me GAVH> why, whether it was too large, non-member post, GAVH> whatever. Now I have to jump to a second page to learn that GAVH> for each sender (and it's normally only one message per GAVH> sender). Since you are putting up the neat summary windows GAVH> with all the options on that first page, could we please GAVH> have the reason for the hold in there? Pretty please? This is better discussed on mailman-developers, or better yet, file a bug report. :) But it seems like a reasonable suggestion! GAVH> As to how long it would take to make a difference, based on GAVH> what folks have said here I suspect that any list with ten GAVH> messages a day would be over 99% accurate by the end of a GAVH> week. Since half my Mailman moderation is probably spam GAVH> these days, I'm looking forward to it. That's doesn't sound too onerous as a training regimen for lists. -Barry From barry at python.org Thu Jan 16 09:57:00 2003 From: barry at python.org (Barry A. Warsaw) Date: Thu Jan 16 09:57:42 2003 Subject: [Spambayes] spambayes fronting a mailing list? References: <200301160612.h0G6C0x14523@localhost.localdomain> <3E26A278.3080302@hooft.net> Message-ID: <15910.51260.847140.60292@gargle.gargle.HOWL> >>>>> "RWWH" == Rob W W Hooft writes: RWWH> Doesn't it take time before the first spam arrives on a RWWH> brand new mailinglist? Spambayes' results are going to be RWWH> real lousy if it is trained on 200 ham and 0 spam RWWH> messages.... Why? Because those spams will be marked as "unsure"? Under my (current) approach, once the messages start getting marked as ham, even if they're held for approval for other reasons, they wouldn't go into the ham training when approved. Presumably, spams that later come in would be marked as spam and when rejected would go into the spam training. But that's all just conjecture. I've no idea whether that will really work in practice. I've got a back up plan if not, but it's more complicated and requires more work from the list admin, so I'd like to experiment with the simpler approach first. -Barry From rob at hooft.net Thu Jan 16 17:14:02 2003 From: rob at hooft.net (Rob W. W. Hooft) Date: Thu Jan 16 11:14:07 2003 Subject: [Spambayes] spambayes fronting a mailing list? References: <200301160612.h0G6C0x14523@localhost.localdomain> <3E26A278.3080302@hooft.net> <15910.51260.847140.60292@gargle.gargle.HOWL> Message-ID: <3E26DA4A.40404@hooft.net> Barry A. Warsaw wrote: >>>>>>"RWWH" == Rob W W Hooft writes: >>>>> > > RWWH> Doesn't it take time before the first spam arrives on a > RWWH> brand new mailinglist? Spambayes' results are going to be > RWWH> real lousy if it is trained on 200 ham and 0 spam > RWWH> messages.... > > Why? Because those spams will be marked as "unsure"? Isn't everything going to be marked as unsure as long as there is no spam at all? That would not be very useful! AFAICS, nothing can be marked "ham" until there is spam in the database. Rob -- Rob W.W. Hooft || rob@hooft.net || http://www.hooft.net/people/rob/ From tim.one at comcast.net Thu Jan 16 11:35:39 2003 From: tim.one at comcast.net (Tim Peters) Date: Thu Jan 16 11:36:11 2003 Subject: [Spambayes] spambayes fronting a mailing list? In-Reply-To: <15910.18557.535408.669103@gargle.gargle.HOWL> Message-ID: [Barry A. Warsaw] > ... > My idea was to not train the list at all, before turning on > spambayes. So the first batch of messages will all get held as > unsure, and you'd use the admindb page to accept and reject messages. > Accept messages would train as ham and rejected messages would get > trained as spam. Better to start by training on a few spam, and a few copies of the list introduction msg (a decent intro msg necessarily contains many words and lexicalisms characteristic of the list's topic). If you have only ham in the database, the false negative rate will zoom (every word in the database will be hammish). If you have only spam in the database, the false positive rate will zoom (every word in the database will be spammish). > ... > I wonder how long it'll take before spambayes gets pretty good at > detecting what's appropriate and what's not for your list? Depends more on list throughput than on time, i.e. it depends more on total # of msgs trained on. By the time you've got 1 of each kind, it should do better than chance. By the time you've got 20 of each kind, it should be a major help. By the time you've got 500 of each, it should be excellent. By the time you've got 15,000 of each, both error rates in c.l.py tests were statistically indistinguishable from 0. I keep hearing that spammers have gotten cleverer since then, but I haven't seen evidence of it in my own email. The spam that sneaks through seems much more likely to be due to spammer incompetence (like spam where they forget to put *anything* in the msg body). From noreply at sourceforge.net Thu Jan 16 08:34:18 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Thu Jan 16 11:38:25 2003 Subject: [Spambayes] [ spambayes-Bugs-669149 ] NameError in ExpiryCorpus.removeExpiredMessages Message-ID: Bugs item #669149, was opened at 2003-01-16 10:34 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=669149&group_id=61702 Category: None Group: None Status: Open Resolution: None Priority: 5 Submitted By: Skip Montanaro (montanaro) Assigned to: Tim Stone (timstone4) Summary: NameError in ExpiryCorpus.removeExpiredMessages Initial Comment: In verbose mode, removeExpiredMessages prints out a line which references the nonexistent variable, key. I have no idea what it should be, otherwise I'd fix it. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=669149&group_id=61702 From noreply at sourceforge.net Thu Jan 16 08:39:59 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Thu Jan 16 11:38:43 2003 Subject: [Spambayes] [ spambayes-Bugs-651365 ] getattr recursion in Corpus.py Message-ID: Bugs item #651365, was opened at 2002-12-10 04:42 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=651365&group_id=61702 Category: None Group: None Status: Open Resolution: None Priority: 5 Submitted By: Wolfgang Strobl (strobl) >Assigned to: Tim Stone (timstone4) Summary: getattr recursion in Corpus.py Initial Comment: After feeding a bunch of new messages into pop3proxy, classifying them and when trying to save the result, I got a recursion loop (followed by recursion depth exceeded) in \cvshome\spambayes\Corpus.py|__getattr__|269] After looking into setSubstance, I noticed that setSubstance (called by load) only sets the attributes payload and hdrtext when the pattern matches. I temporarily added an else clause to bmatch, i.e. if bmatch: self.payload = bmatch.group(2) self.hdrtxt = sub[:bmatch.start(2)] print ".", else: self.payload = "nix\r\n" self.hdrtxt="nix\r\n" print "?", len(sub), and indeed, when trying to save, I notice that after about 800 good messages, ~ 100 have an empty message, see the output below. I don't really know what I'm doing here, but at this fix at least allows me to continue. ------------------------- C:\archiv\cvshome\spambayes>python -u pop3proxy.py - l 8110 mail.gmd.de Loading database... Done. Listener on port 8110 is proxying mail:110 User interface url is http://localhost:8880 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 . . . . . . . . . . . . . . . ----------------------- Initial traceback: error: uncaptured python exception, closing channel <__main__.UserInterface conn ected at 0x2213470> (exceptions.RuntimeError:maximum recursion depth exceeded [C :\Python22\lib\asyncore.py|poll|95] [C:\Python22 \lib\asyncore.py|handle_read_eve nt|392] [C:\Python22\lib\asynchat.py|handle_read|112] [C:\archiv\cvshome\spambay es\pop3proxy.py|found_terminator|804] [C:\archiv\cvshome\spambayes\pop3proxy.py| onRequest|830] [C:\archiv\cvshome\spambayes\pop3proxy.py|onReview|1 093] [C:\arch iv\cvs\spambayes\Corpus.py|takeMessage|188] [C:\archiv\cvs\spambayes\FileCorpus. py|addMessage|140] [C:\archiv\cvs\spambayes\FileCorpus.py|store|231] [C:\archiv\ cvs\spambayes\Corpus.py|getSubstance|318] [C:\archiv\cvs\spambayes\Corpus.py|__g etattr__|269] [C:\archiv\cvs\spambayes\Corpus.py|__getattr__|269] [C:\archiv\cvs \spambayes\Corpus.py|__getattr__|269] [C:\archiv\cvs\spambayes\Corpus.py|__getat ---------------------------------------------------------------------- >Comment By: Skip Montanaro (montanaro) Date: 2003-01-16 10:39 Message: Logged In: YES user_id=44345 Assigning to Tim Stone. I think this is the same problem I reported on the list the other day. I think the offending code is in Corpus.__getitem__. The test of amsg - "if not amsg" should be "if amsg is None" I think. I suspect a fix further up the line as the OP indicated would probably do the trick. If you don't do something to set self.hdrtxt I believe it is None and you infloop trying to resolve a non-existent __nonzero__ method. Something like that. ;-) ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=651365&group_id=61702 From skip at pobox.com Thu Jan 16 11:45:49 2003 From: skip at pobox.com (Skip Montanaro) Date: Thu Jan 16 12:45:57 2003 Subject: [Spambayes] proxytrainer.py and proxytee.py are checked in Message-ID: <15910.61389.133887.569308@montanaro.dyndns.org> I just checked in proxytrainer.py and proxytee.py. The former is essentially pop3proxy.py with the POP stuff removed. I know this results in a large amount of code duplication, but a) it was the fastest way for me to get a GUI training interface without using POP, and b) maybe I can convince the pop3proxy advocates to slim it down by ripping out the user interface stuff. ;-) Proxytee.py is like the Unix tee program (copy stdin to stdout and an external file), except the "external file" is to upload the message or mailbox as a file to proxytrainer.py. I'm still experimenting with things, but should have proxytee.py embedded into my procmailrc file by the end of the day once I refresh my memory on flags and such. Come to think of it, hammiefilter.py could pretty easily be extended to do the file upload. The core functionality is implemented in two functions Wade Leftwich posted to the Python Cookbook. hmmm... Skip From tim at fourstonesExpressions.com Thu Jan 16 11:52:15 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Thu Jan 16 12:52:53 2003 Subject: [Spambayes] spambayes fronting a mailing list? In-Reply-To: Message-ID: 1/16/2003 10:35:39 AM, Tim Peters wrote: >[Barry A. Warsaw] >> ... >> My idea was to not train the list at all, before turning on >> spambayes. So the first batch of messages will all get held as >> unsure, and you'd use the admindb page to accept and reject messages. >> Accept messages would train as ham and rejected messages would get >> trained as spam. I think I'm hearing something on this thread that doesn't make much sense to me. If we always train as spam stuff that's been classified as spam, always train as ham stuff that's been classified as ham, then we're kinda reinforcing the obvious, and increasing the spaminess of words in that spam... isn't it more realistic (and ultimately actually better) to train on a random sample rather than always? - TimS > >Better to start by training on a few spam, and a few copies of the list >introduction msg (a decent intro msg necessarily contains many words and >lexicalisms characteristic of the list's topic). > >If you have only ham in the database, the false negative rate will zoom >(every word in the database will be hammish). > >If you have only spam in the database, the false positive rate will zoom >(every word in the database will be spammish). > >> ... >> I wonder how long it'll take before spambayes gets pretty good at >> detecting what's appropriate and what's not for your list? > >Depends more on list throughput than on time, i.e. it depends more on total ># of msgs trained on. By the time you've got 1 of each kind, it should do >better than chance. By the time you've got 20 of each kind, it should be a >major help. By the time you've got 500 of each, it should be excellent. By >the time you've got 15,000 of each, both error rates in c.l.py tests were >statistically indistinguishable from 0. > >I keep hearing that spammers have gotten cleverer since then, but I haven't >seen evidence of it in my own email. The spam that sneaks through seems >much more likely to be due to spammer incompetence (like spam where they >forget to put *anything* in the msg body). > > >_______________________________________________ >Spambayes mailing list >Spambayes@python.org >http://mail.python.org/mailman/listinfo/spambayes > > c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org From tim at fourstonesExpressions.com Thu Jan 16 11:57:05 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Thu Jan 16 12:57:42 2003 Subject: [Spambayes] proxytrainer.py and proxytee.py are checked in In-Reply-To: <15910.61389.133887.569308@montanaro.dyndns.org> Message-ID: 1/16/2003 11:45:49 AM, Skip Montanaro wrote: >I just checked in proxytrainer.py and proxytee.py. The former is >essentially pop3proxy.py with the POP stuff removed. I know this results in >a large amount of code duplication, but a) it was the fastest way for me to >get a GUI training interface without using POP, and b) maybe I can convince >the pop3proxy advocates to slim it down by ripping out the user interface >stuff. ;-) Proxytee.py is like the Unix tee program (copy stdin to stdout >and an external file), except the "external file" is to upload the message >or mailbox as a file to proxytrainer.py. > >I'm still experimenting with things, but should have proxytee.py embedded >into my procmailrc file by the end of the day once I refresh my memory on >flags and such. > >Come to think of it, hammiefilter.py could pretty easily be extended to do >the file upload. The core functionality is implemented in two functions >Wade Leftwich posted to the Python Cookbook. hmmm... I think we're really onto something here, that's bothered me for a while now. There is a core engine in all of this stuff that really should be packaged as such. Classifier, tokenizer, the corpus stuff, and the training stuff, is basically it. Corpus isn't up to the task yet, but with some rework it could be made usable to hammiefilter, pop3proxy, outlook, proxytee, or any other client type we can think up... Richie and I have had a couple of offlist jabs at this, and I know Richie is in the process of ripping pop3proxy apart into smaller components... - TimS > >Skip > >_______________________________________________ >Spambayes mailing list >Spambayes@python.org >http://mail.python.org/mailman/listinfo/spambayes > > c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org From tim.one at comcast.net Thu Jan 16 13:55:48 2003 From: tim.one at comcast.net (Tim Peters) Date: Thu Jan 16 13:56:57 2003 Subject: [Spambayes] spambayes fronting a mailing list? In-Reply-To: Message-ID: [Tim Stone - Four Stones Expressions] > I think I'm hearing something on this thread that doesn't make > much sense to me. If we always train as spam stuff that's been > classified as spam, always train as ham stuff that's been > classified as ham, then we're kinda reinforcing the obvious, and > increasing the spaminess of words in that spam... isn't it > more realistic (and ultimately actually better) to train on a > random sample rather than always? - TimS Testing results failed to find any way of training that didn't work well, ranging from purely mistake-based training, to letting a classifier self-train on its own decisions. My real-life experience on my own email is that pure mistake-based training is unsatisfactory in practice because it keeps the Unsure rate higher longer than need be (also showed in formal tests), and especially because the *kinds* of spam that remained Unsure were maddeningly "obvious" spam (something I don't know how to test formally). OTOH, in real life now I started with a few hundred random msgs, and since then have done *almost* purely mistake-based training. This may not be optimal (and I believe it is not), but leaves so little manual classification for me to do that I don't care. When error rates get below 1%, the difference between, say, 0.5% and 0.2% is more than a factor of two, but isn't actually noticeable unless you've got many thousands of msgs to dig thru. This *is* the case for the mailing list run via comp.lang.python's news<->mail gateway, and more-careful training there may more than repay the cost. But most Mailman lists have much lower volume, and "excellent" results with little training effort may be more attractive to list admins than "superb" results requiring substantially more training effort. The important thing now is just that Barry get off his ass and start . From skip at pobox.com Thu Jan 16 13:04:21 2003 From: skip at pobox.com (Skip Montanaro) Date: Thu Jan 16 14:04:33 2003 Subject: [Spambayes] spambayes fronting a mailing list? In-Reply-To: References: Message-ID: <15911.565.674990.932342@montanaro.dyndns.org> Tim> But most Mailman lists have much lower volume, and "excellent" Tim> results with little training effort may be more attractive to list Tim> admins than "superb" results requiring substantially more training Tim> effort. Which suggests that if Barry hasn't already considered it (and I'll be he has given that bass players are about three steps up on the evolutionary scale from say, drummers or viola players :-), he should give Mailman admins a variety of ways to train: everything, mistakes only, random, unsures only, etc. Skip From tim at fourstonesExpressions.com Thu Jan 16 17:44:02 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Thu Jan 16 18:44:44 2003 Subject: [Spambayes] spambayes fronting a mailing list? In-Reply-To: <3E272711.9050001@hooft.net> Message-ID: <04VQKE84QNC731SOVUS5ONMLPNLIMI.3e2743c2@myst> This is a great discussion, one I think we should include on the main site. This is obviously a superiority that we (this algorithm) has... you can hardly go wrong! In my installation, I have 156 spam and 223 ham, and it almost never makes a classification mistake. Unsures are almost always ham, spam is DOA. It hasn't improved (there is precious little *room* for improvement) since a few days after I started this database. In fact, I'm a bit reluctant to reup, it's working so well Now my mail is a bit unique in that I get mostly machine driven event notification mails, which are VERY similar... There's probably 5 different email content patterns/sources that comprise 90% of my mail (e.g. "Order Received", "Mail List Opt-In" "Spambayes", etc.) But even the unique stuff is nailed as ham almost all the time. Perhaps we can document a few training patterns: mistake driven, classification driven, random sample driven, , and allow users to select which type of training pattern they want to do. The user interface, then, might only present messages that are pertinent for that type of training regimen. For example, the pop3proxy right now presents every message it receives in buckets by classification. If I'm doing classification driven training, I wouldn't need to look at every spam that comes in... Oh I don't know, I'm rambling now... - TimS 1/16/2003 3:41:37 PM, Rob Hooft wrote: >Tim Stone - Four Stones Expressions wrote: >> >> I think I'm hearing something on this thread that doesn't make much sense to >> me. If we always train as spam stuff that's been classified as spam, always >> train as ham stuff that's been classified as ham, then we're kinda reinforcing >> the obvious, and increasing the spaminess of words in that spam... isn't it >> more realistic (and ultimately actually better) to train on a random sample >> rather than always? - TimS > Tim1 said: >Testing results failed to find any way of training that didn't work well, >ranging from purely mistake-based training, to letting a classifier >self-train on its own decisions. My real-life experience on my own email is >that pure mistake-based training is unsatisfactory in practice because it >keeps the Unsure rate higher longer than need be (also showed in formal >tests), and especially because the *kinds* of spam that remained Unsure were >maddeningly "obvious" spam (something I don't know how to test formally). > >OTOH, in real life now I started with a few hundred random msgs, and since >then have done *almost* purely mistake-based training. This may not be >optimal (and I believe it is not), but leaves so little manual >classification for me to do that I don't care. When error rates get below >1%, the difference between, say, 0.5% and 0.2% is more than a factor of two, >but isn't actually noticeable unless you've got many thousands of msgs to >dig thru. This *is* the case for the mailing list run via >comp.lang.python's news<->mail gateway, and more-careful training there may >more than repay the cost. But most Mailman lists have much lower volume, >and "excellent" results with little training effort may be more attractive >to list admins than "superb" results requiring substantially more training >effort. > Rob said: >Nope, the mathematics say this isn't true. Say by the word "Sex" you >recognize a new message as being spam. This message may be the first >that contains the word "oral", so training on this makes it a spammy >word. The word "sex" becomes more spammy. And the word "ink-cartridge" >that does not appear in this message becomes a little less spammy. > >In other words: training on a new spam doesn't only make the tokens in >it more spammy, but also makes the spammy tokens that do not occur in >there less spammy. > >Then there are words that occur both in ham and in spam messages. There >it is important to get the right "balance". If you train only on >"non-obvious" cases, this will almost certainly result in an imbalance. > >All of this determines, like Tim1 explained, only the difference between >excellent and superb separation of classes. > >Rob > >-- >Rob W.W. Hooft || rob@hooft.net || http://www.hooft.net/people/rob/ > > > c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org From mhammond at skippinet.com.au Fri Jan 17 10:56:21 2003 From: mhammond at skippinet.com.au (Mark Hammond) Date: Thu Jan 16 18:57:09 2003 Subject: [Spambayes] spambayes fronting a mailing list? In-Reply-To: <15910.42543.629381.696105@montanaro.dyndns.org> Message-ID: <002001c2bdba$df01f790$530f8490@eden> [Skip] > and since it's set up for members only posting, little, if any garbage > actually gets through to the list, even without using spambayes. Unfortunately, none of this stuff gets through as the poor list administrator has explicitly rejected it. So particularly for closed lists, spambayes could be a huge bonus - auto-reject any non-members posts with a particular score, and most of my admin duties will vanish! Mark. From mhammond at skippinet.com.au Fri Jan 17 11:03:44 2003 From: mhammond at skippinet.com.au (Mark Hammond) Date: Thu Jan 16 19:04:44 2003 Subject: [Spambayes] spambayes fronting a mailing list? In-Reply-To: Message-ID: <002201c2bdbb$e7642830$530f8490@eden> [Tim1] > tests), and especially because the *kinds* of spam that > remained Unsure were > maddeningly "obvious" spam (something I don't know how to > test formally). This is touching my test-of-training-strategies comments recently. If we have a decent framework in place, then "obvious" spam would be anything that is spam given complete data. ie, assume we have 3000 ham and 3000 spam. My training strategy would be to perform a complete train over the entire database, and collect "correct" scores for each item. We then can test out various training strategies, watching not only the fp/fn/unsure rates, but also deviance from the "correct" score. > OTOH, in real life now I started with a few hundred random > msgs, and since > then have done *almost* purely mistake-based training. This > may not be > optimal (and I believe it is not), but leaves so little manual > classification for me to do that I don't care. Do you believe we can reasonable formalize some tests for these strategies? > The important thing now is just that Barry get off his ass > and start . Yeah, 'cos when he is finished there are some nice training strategies I would like him to work on Mark. From mhammond at skippinet.com.au Fri Jan 17 11:05:14 2003 From: mhammond at skippinet.com.au (Mark Hammond) Date: Thu Jan 16 19:07:27 2003 Subject: [Spambayes] spambayes fronting a mailing list? In-Reply-To: <002001c2bdba$df01f790$530f8490@eden> Message-ID: <002301c2bdbc$1ddae020$530f8490@eden> [I wrote] > So particularly for closed lists, spambayes could be a huge bonus - > auto-reject any non-members posts with a particular score, > and most of my > admin duties will vanish! Obviously too early for me. It will not help closed lists. What it *will* do is allow me to open up a few lists - ones that I only closed due to the spam coming through. Mark. From tim.one at comcast.net Thu Jan 16 22:06:16 2003 From: tim.one at comcast.net (Tim Peters) Date: Thu Jan 16 22:06:51 2003 Subject: [Spambayes] spambayes fronting a mailing list? In-Reply-To: <002201c2bdbb$e7642830$530f8490@eden> Message-ID: [Mark Hammond] > ... > If we have a decent framework in place, then "obvious" spam would be > anything that is spam given complete data. That's not how I meant it. "Obvious" is a human judgment, and is (AFAICT) subjective. Purely mistake-based training, starting from an empty database, left substantial "obvious spam" in the Unsure category even after 2 weeks, which is well over 1200 spam at the rate I get spam. So little spam got trained on during that time (there weren't many mistakes after the first two days) that spam-detection remained mostly hapax-driven, and the few instances of trained farm-porn spam didn't do enough to nail gay-porn spam too, etc. "Obvious spam" means that you personally are surprised to see it rate Unsure, at least surprised enough to click the "Spam Clues" button to try to figure out why it wasn't nailed. > ie, assume we have 3000 ham and 3000 spam. My training strategy > would be to perform a complete train over the entire database, and > collect "correct" scores for each item. I'm not sure what correct means here. How do you decide? You're surely not going to look at those 6,000 msgs by hand and assign a two-digit number to each, right? > We then can test out various training strategies, watching not only > the fp/fn/unsure rates, but also deviance from the "correct" score. > ... > Do you believe we can reasonable formalize some tests for these > strategies? If you can define what it is you're trying to measure, sure . All along in testing we used a three-term cost function (assigning different "dollar" penalties to FP, FN and unsure), and the measure of goodness was how small the total penalty got. It's easy (albeit tedious) to set up experiments to measure the effect of any definable training strategy on that. If you define a different penalty function, likewise. From frank.horowitz at csiro.au Fri Jan 17 11:16:51 2003 From: frank.horowitz at csiro.au (Frank Horowitz) Date: Thu Jan 16 22:26:24 2003 Subject: [Spambayes] Sourceforge :pserver cvs access broken... Message-ID: <1042773411.22390.7.camel@bonzo.ned.dem.csiro.au> ... and has been for a few days: http://sourceforge.net/docman/display_doc.php?docid=2352&group_id=1#cv While this doesn't affect those with developer cvs access (via SSH), it kind of makes it hard for we "lurkers" to get our spambayes fixes (err, I mean "patches" of course ;-). Does anyone (by any small miracle) have a mirror of the cvs tree that they'd be willing to put online while SF gets it's act together? Cheers, Frank Horowitz From frank.horowitz at csiro.au Fri Jan 17 11:33:19 2003 From: frank.horowitz at csiro.au (Frank Horowitz) Date: Thu Jan 16 22:42:00 2003 Subject: [Spambayes] Re: Sourceforge :pserver cvs access broken... (Good URL) Message-ID: <1042774398.22402.11.camel@bonzo.ned.dem.csiro.au> Sorry. Pilot error (or something; mutter). That URL again: http://sourceforge.net/docman/display_doc.php?docid=2352&group_id=1#cvs Frank From tim.one at comcast.net Thu Jan 16 22:46:27 2003 From: tim.one at comcast.net (Tim Peters) Date: Thu Jan 16 22:47:02 2003 Subject: [Spambayes] FYI: Java implementation In-Reply-To: <3E25521B.20937.3607FFA@localhost> Message-ID: [Richard Jowsey] > I've been building a Java implementation of Paul Graham's > "Bayesian" classification logic over the past couple months, > intended as a plug-in filter for the Apache JAMES mail server. Upgrade to Python and you would have finished a couple months ago . > However, after considerable testing, tweaking and tuning via a > proxy setup (similar to POPFile), plus some recent lurking on > the Spambayes list, I'm now modifying this project to > incorporate the excellent notions contributed by Gary Robinson, > et al, as implemented in your Python code. > > Early results are *very* promising!!! This death2spam stuff is > definitely heading in the right direction! I haven't quite > finished the chi2 comparison logic, but even using just "gary- > combining", the kinds of messages ending up in my "uncertain" > category make much more sense. chi-combining will give you more of the same. The combining methods are related, in such a way that they're monotonic with each other. chi is more extreme, and you'll find that it pushes most spam very close to 1.0, most ham very close to 0.0, and highly ambiguous msgs very close to 0.5. This gives it some nice properties for automated decision making (the cutoff points for gary-combining were too touchy, across test sets, and across time). But if you like a mode where you simply sort msgs by score, you can stop with gary-combining and be happy. > Plus I'm now seeing far less weirdness caused by Graham's > "2 * nGood + nSpam >= 5" trick, etc. Will keep the list posted as to > further progress. The biases indeed had strange effects! It was quite a struggle to eliminate all of them, in part because near the end of that struggle, some biases acted to counteract others, so removing any one of them in isoolation made things worse. Gary Robinson pushed us out of the pit by proposing to eliminate all the remaining biases in one shot. I'm glad we were wise enough to listen to him > > I'd sure love to attend the upcoming spam-fest at MIT, but we > moved downunder (Seattle -> Sydney) last year, and it's one > helluva long way to go just for a day... Meet up with Mark Hammond instead. He wrote the wondrous Outlook 2000 client for this project, and also sleeps upside down. Just don't try to talk to him about Java. Our Anthony Baxter, who deserves more thanks at least for his thankless work in maintaining the web site, is also on the wrong side of the globe. > Many thanks for all your fine coding, testing efforts, and > thoughtful conversations! It's been very helpful, not to mention > highly entertaining at times. ;-) Less spam means more time for fun. Too bad I was kicked off the project . From anthony at interlink.com.au Fri Jan 17 15:09:56 2003 From: anthony at interlink.com.au (Anthony Baxter) Date: Thu Jan 16 23:11:24 2003 Subject: [Spambayes] Sourceforge :pserver cvs access broken... In-Reply-To: <1042773411.22390.7.camel@bonzo.ned.dem.csiro.au> Message-ID: <200301170409.h0H49uq25399@localhost.localdomain> >>> Frank Horowitz wrote > Does anyone (by any small miracle) have a mirror of the cvs tree that > they'd be willing to put online while SF gets it's act together? I'm planning a first pre-release tarball later today. Anthony -- Anthony Baxter It's never too late to have a happy childhood. From anthony at interlink.com.au Fri Jan 17 15:11:36 2003 From: anthony at interlink.com.au (Anthony Baxter) Date: Thu Jan 16 23:13:01 2003 Subject: [Spambayes] proxytrainer.py and proxytee.py are checked in In-Reply-To: <15910.61389.133887.569308@montanaro.dyndns.org> Message-ID: <200301170411.h0H4Baw25446@localhost.localdomain> >>> Skip Montanaro wrote > I just checked in proxytrainer.py and proxytee.py. Cool. I won't put them in the first "release package" today - let's see if they work first :) -- Anthony Baxter It's never too late to have a happy childhood. From anthony at interlink.com.au Fri Jan 17 15:14:31 2003 From: anthony at interlink.com.au (Anthony Baxter) Date: Thu Jan 16 23:16:04 2003 Subject: [Spambayes] spambayes fronting a mailing list? In-Reply-To: <04VQKE84QNC731SOVUS5ONMLPNLIMI.3e2743c2@myst> Message-ID: <200301170414.h0H4EV625489@localhost.localdomain> >>> Tim Stone - Four Stones Expressions wrote > This is a great discussion, one I think we should include on the main site. I'm working on a bit on the background page on the "training" section. It's not there yet. And yes, I know that "background", "documentation", and "developer" need to be sorted out. At the moment I'm just trying to get the words down in readable english - then we can work out what goes where... Anthony -- Anthony Baxter It's never too late to have a happy childhood. From T.A.Meyer at massey.ac.nz Fri Jan 17 17:16:58 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Thu Jan 16 23:17:44 2003 Subject: [Spambayes] Outlook plugin & Bad Folders Message-ID: <98B01D2717B9D411B38F0008C78409310EE3DAF7@its-xchg2.massey.ac.nz> OK, I've solved the MAPI/Exchange problem, I think. What was happening was that we were storing a short term id, not a long term one. This is why the id worked when buliding the list, but not later on. See this for more information: It now works for me - builds the tree, and I can use the results (i.e. filter on a selected folder). Unfortunately, my copy of FolderSelector.py is stuffed because I've played around with it so much, and I can't check out a new copy because of the sourceforge cvs problem. This is my new _BuildFoldersMAPI function, which is all that needs to be changed (plus changing FilterDialog.py, TrainingDialog.py and FolderSelector.py to use MAPI and not Outlook tree builds). def _BuildFoldersMAPI(msgstore, folder): # Get the hierarchy table for it. table = folder.GetHierarchyTable(0) children = [] rows = mapi.HrQueryAllRows(table, (PR_ENTRYID, PR_STORE_ENTRYID, PR_DISPLAY_NAME_A), None, None, 0) for (eid_tag, eid),(storeeid_tag, store_eid), (name_tag, name) in rows: folder_id = mapi.HexFromBin(store_eid), mapi.HexFromBin(eid) spec = FolderSpec(folder_id, name) try: child_folder = msgstore.OpenEntry(eid, None, mapi.MAPI_DEFERRED_ERRORS) prop_ids = PR_ENTRYID, PR_STORE_ENTRYID hr, data = child_folder.GetProps(prop_ids,0) folder_eid = data[0][1] spec.folder_id = mapi.HexFromBin(store_eid), mapi.HexFromBin(folder_eid) except pythoncom.error: # Something strange with this folder - just ignore it spec = None if spec is not None: spec.children = _BuildFoldersMAPI(msgstore, child_folder) children.append(spec) return children The bad news, of course, is that this (I believe) means that MAPI works, but that means my nice build-on-demand code is broken. I guess I'll have to re-implement in using MAPI... =Tony Meyer From anthony at interlink.com.au Fri Jan 17 17:29:58 2003 From: anthony at interlink.com.au (Anthony Baxter) Date: Fri Jan 17 01:31:40 2003 Subject: [Spambayes] Sourceforge :pserver cvs access broken... In-Reply-To: <1042778024.22400.16.camel@bonzo.ned.dem.csiro.au> Message-ID: <200301170629.h0H6TwK28960@localhost.localdomain> There's now a nightly snapshot available from the front page of the website. At the moment I'm building them from my laptop and pushing them out - once the pserver's working again, I'll move it to a cron job at SF. Anthony -- Anthony Baxter It's never too late to have a happy childhood. From whisper at oz.net Thu Jan 16 23:06:24 2003 From: whisper at oz.net (David LeBlanc) Date: Fri Jan 17 02:05:46 2003 Subject: [Spambayes] SF CVS Message-ID: What they say: (2003-01-14 14:04:19 - Project CVS Services) As of 2003-01-14, pserver-based CVS repository access and ViewCVS (web-based) CVS repository access have been taken offline as to stabilize CVS server performance for developers. These services will be re-enabled as soon as the underlying scalability issues have been analyzed and resolved (as soon as 2003-01-15, if possible). Additional updates will be posted to the Site Status page as they become available. Your patience is appreciated. David LeBlanc Seattle, WA USA From anthony at interlink.com.au Fri Jan 17 18:36:43 2003 From: anthony at interlink.com.au (Anthony Baxter) Date: Fri Jan 17 02:38:26 2003 Subject: [Spambayes] 1.0a1 is done. Message-ID: <200301170736.h0H7ah429690@localhost.localdomain> I'm done with the packaging &c of the first pre-release. Can people have a look at this and see what's missing/busted/stupid, and let me know? Should we drop a note to something like comp.lang.python.announce? See the download page on the website for more... Anthony From francois.granger at free.fr Fri Jan 17 11:54:18 2003 From: francois.granger at free.fr (=?iso-8859-1?Q?Fran=E7ois?= Granger) Date: Fri Jan 17 05:54:24 2003 Subject: [Spambayes] Using Spambayes w/ Eudora In-Reply-To: References: Message-ID: At 22:49 -0800 13/01/2003, in message Re: [Spambayes] Using Spambayes w/ Eudora, Tony Lownds wrote: >I have a startup script that sets everything up I did copy and past your modifications on my MacOS X station. I am currently using an "old" version of Spambayes... most recent files being dated December 29. It woks perfect on two pop servers. It does not work on a third one. I can't figure out why. I exchanged the local proxy addresses and the same server was unreachable. I guess this is my problem ;-) this server being pop.laposte.net, it may be kind of "special". But I can reach it directly with Eudora or fetchmail (home) and Entourage (work). -- Recently using MacOSX....... From t.a.meyer at massey.ac.nz Fri Jan 17 11:28:14 2003 From: t.a.meyer at massey.ac.nz (t.a.meyer@massey.ac.nz) Date: Fri Jan 17 06:28:21 2003 Subject: [Spambayes] 1.0a1 is done. Message-ID: > See the download page on the website for more... This 404's for me. =Tony Meyer From anthony at interlink.com.au Fri Jan 17 22:40:40 2003 From: anthony at interlink.com.au (Anthony Baxter) Date: Fri Jan 17 06:42:04 2003 Subject: [Spambayes] 1.0a1 is done. In-Reply-To: Message-ID: <200301171140.h0HBefc32422@localhost.localdomain> >>> t.a.meyer@massey.ac.nz wrote > > See the download page on the website for more... > > This 404's for me. which does? I can't seem to see any 404s...? From mwh at python.net Fri Jan 17 11:48:03 2003 From: mwh at python.net (Michael Hudson) Date: Fri Jan 17 06:48:08 2003 Subject: [Spambayes] Re: 1.0a1 is done. References: <200301171140.h0HBefc32422@localhost.localdomain> Message-ID: <2mn0m0xaj0.fsf@starship.python.net> Anthony Baxter writes: > >>> t.a.meyer@massey.ac.nz wrote > > > See the download page on the website for more... > > > > This 404's for me. > > which does? I can't seem to see any 404s...? http://spambayes.sourceforge.net/downloads.html 404s for me. Ah, there's a link from the index page to the above; it has an extra 's' at the end... Cheers, M> From anthony at interlink.com.au Fri Jan 17 22:52:55 2003 From: anthony at interlink.com.au (Anthony Baxter) Date: Fri Jan 17 06:54:15 2003 Subject: [Spambayes] Re: 1.0a1 is done. In-Reply-To: <2mn0m0xaj0.fsf@starship.python.net> Message-ID: <200301171152.h0HBqtE32557@localhost.localdomain> >>> Michael Hudson wrote > http://spambayes.sourceforge.net/downloads.html > > 404s for me. > > Ah, there's a link from the index page to the above; it has an extra > 's' at the end... dammit. thanks. fixed. -- Anthony Baxter It's never too late to have a happy childhood. From Alexander at Leidinger.net Fri Jan 17 13:47:41 2003 From: Alexander at Leidinger.net (Alexander Leidinger) Date: Fri Jan 17 07:48:16 2003 Subject: [Spambayes] Stemming and stopword elemination Message-ID: <20030117134741.1e88011c.Alexander@Leidinger.net> Hi, has someone already experimented with Information Retrieval techniques like stopword elemination (stopwords: the, a, an, or, and, ...) and word stemming? See http://www.tartarus.org/~martin/PorterStemmer for a description of the algorithm for english text and a python implementation, or http://snowball.tartarus.org/ for non-english stemmers. I don't think this will change the failure rate significantly (maybe better results with few training data, maybe worser; I don't expect much change with large training data), but it should reduce the size of the needed database. Bye, Alexander. -- I believe the technical term is "Oops!" http://www.Leidinger.net Alexander @ Leidinger.net GPG fingerprint = C518 BC70 E67F 143F BE91 3365 79E2 9C60 B006 3FE7 From papaDoc at videotron.ca Fri Jan 17 09:15:09 2003 From: papaDoc at videotron.ca (papaDoc) Date: Fri Jan 17 09:15:10 2003 Subject: [Spambayes] Re: 1.0a1 is done Message-ID: <3E280FED.5070305@videotron.ca> Hi, I tested it and this is what I found 1- I think there is a typo in INTEGRATION.txt 183c183 < The minimum you need to do to get started is create a bayescustomize.ini --- > The minimum you need too do to get started is create a bayescustomize.ini 2- When I try to run pop3graph.py. I get this error message Traceback (most recent call last): File "D:\REMI_N~1\MAILFI~1\SPAMBA~1.0A1\UTILIT~1\POP3GR~1.PY", line 12, in ? from spambayes import mboxutils ImportError: No module named spambayes 3- This is not a problem with the release but I will ask I'm running pop3proxy since a while so I have accumulated some ham and spam that pop3proxy saved in the cache file. How can I make the new pop3proxy aware of those file. If I only copy the cache file or define the values pop3proxy_spam_cache: D:/Remi_NoBackup/MailFilter/pop3proxy-spam-cache pop3proxy_ham_cache: D:/Remi_NoBackup/MailFilter/pop3proxy-ham-cache pop3proxy_unknown_cache: D:/Remi_NoBackup/MailFilter/pop3proxy-unknown-cache I still have in the web interface Total emails trained: Spam: 0 Ham: 0 instead of Total emails trained: Spam: 68 Ham: 93 From tim at fourstonesExpressions.com Fri Jan 17 08:18:55 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Fri Jan 17 09:19:32 2003 Subject: [Spambayes] Re: 1.0a1 is done Message-ID: 1/17/2003 8:15:09 AM, papaDoc wrote: >Hi, > >I tested it and this is what I found > >1- I think there is a typo in INTEGRATION.txt >183c183 >< The minimum you need to do to get started is create a bayescustomize.ini >--- > > The minimum you need too do to get started is create a bayescustomize.ini > >2- When I try to run pop3graph.py. I get this error message >Traceback (most recent call last): > File "D:\REMI_N~1\MAILFI~1\SPAMBA~1.0A1\UTILIT~1\POP3GR~1.PY", line >12, in ? > from spambayes import mboxutils >ImportError: No module named spambayes > >3- This is not a problem with the release but I will ask >I'm running pop3proxy since a while so I have accumulated some ham and >spam that pop3proxy saved in the cache file. >How can I make the new pop3proxy aware of those file. >If I only copy the cache file or define the values >pop3proxy_spam_cache: D:/Remi_NoBackup/MailFilter/pop3proxy-spam- cache >pop3proxy_ham_cache: D:/Remi_NoBackup/MailFilter/pop3proxy-ham- cache >pop3proxy_unknown_cache: D:/Remi_NoBackup/MailFilter/pop3proxy- unknown-cache >I still have in the web interface > Total emails trained: Spam: 0 Ham: 0 >instead of > Total emails trained: Spam: 68 Ham: 93 Right now the only way to handle this is to retrain your database using hammiefilter. A bit of a pain, but it's your only option. - TimS > > > >_______________________________________________ >Spambayes mailing list >Spambayes@python.org >http://mail.python.org/mailman/listinfo/spambayes > > c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org From bkc at murkworks.com Fri Jan 17 09:30:06 2003 From: bkc at murkworks.com (Brad Clements) Date: Fri Jan 17 09:21:32 2003 Subject: [Spambayes] spamconference webcasts Message-ID: <3E27CAEC.14277.6FB1C7CD@localhost> For those who don't know, the spam conference is online live now .. http://www.spamconference.org follow the link to webcasts -- Brad Clements, bkc@murkworks.com (315)268-1000 http://www.murkworks.com (315)268-9812 Fax http://www.wecanstopspam.org/ AOL-IM: BKClements From skip at pobox.com Fri Jan 17 08:43:35 2003 From: skip at pobox.com (Skip Montanaro) Date: Fri Jan 17 09:43:51 2003 Subject: [Spambayes] Sourceforge :pserver cvs access broken... In-Reply-To: <1042773411.22390.7.camel@bonzo.ned.dem.csiro.au> References: <1042773411.22390.7.camel@bonzo.ned.dem.csiro.au> Message-ID: <15912.5783.216516.749029@montanaro.dyndns.org> Frank> Does anyone (by any small miracle) have a mirror of the cvs tree Frank> that they'd be willing to put online while SF gets it's act Frank> together? Not a mirror, but I just put a gzipped tar file snapshot at http://www.musi-cal.com/~skip/python.spambayes.tar.gz I'd be happy to update it periodically, though I have to do it manually, since on that machine cvs prompts me for my SF password when I 'cvs up'. Skip From skip at pobox.com Fri Jan 17 08:46:49 2003 From: skip at pobox.com (Skip Montanaro) Date: Fri Jan 17 09:46:59 2003 Subject: [Spambayes] FYI: Java implementation In-Reply-To: References: <3E25521B.20937.3607FFA@localhost> Message-ID: <15912.5977.869435.819287@montanaro.dyndns.org> Tim> Less spam means more time for fun. Too bad I was kicked off the Tim> project . That's what you get for having too much fun. Barry got jealous. (Those bass players are a jealous lot you know.) ;-) Skip From skip at pobox.com Fri Jan 17 09:00:39 2003 From: skip at pobox.com (Skip Montanaro) Date: Fri Jan 17 10:00:49 2003 Subject: [Spambayes] spamconference webcasts In-Reply-To: <3E27CAEC.14277.6FB1C7CD@localhost> References: <3E27CAEC.14277.6FB1C7CD@localhost> Message-ID: <15912.6807.897201.900909@montanaro.dyndns.org> Brad> For those who don't know, the spam conference is online live now .. Brad> http://www.spamconference.org follow the link to webcasts Thanks! So much for getting any other work done today. ;-) Skip From tim at fourstonesExpressions.com Fri Jan 17 09:02:34 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Fri Jan 17 10:03:16 2003 Subject: [Spambayes] FYI: Java implementation In-Reply-To: <15912.5977.869435.819287@montanaro.dyndns.org> Message-ID: <1YUQA7UQYT421W2UZVGAGBDBIGVQVT51.3e281b0a@myst> 1/17/2003 8:46:49 AM, Skip Montanaro wrote: > > Tim> Less spam means more time for fun. Too bad I was kicked off the > Tim> project . > >That's what you get for having too much fun. Barry got jealous. (Those >bass players are a jealous lot you know.) ;-) Ya, and all for what...? Using two fingers at a time to play one note at a time? - TimS > >Skip > > >_______________________________________________ >Spambayes mailing list >Spambayes@python.org >http://mail.python.org/mailman/listinfo/spambayes > > c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org From anthony at interlink.com.au Sat Jan 18 03:04:22 2003 From: anthony at interlink.com.au (Anthony Baxter) Date: Fri Jan 17 11:07:21 2003 Subject: [Spambayes] spamconference webcasts In-Reply-To: <15912.6807.897201.900909@montanaro.dyndns.org> Message-ID: <200301171604.h0HG4NK01834@localhost.localdomain> >>> Skip Montanaro wrote > > Brad> For those who don't know, the spam conference is online live now .. > Brad> http://www.spamconference.org follow the link to webcasts > > Thanks! So much for getting any other work done today. ;-) ... or sleep :-/ From skip at pobox.com Fri Jan 17 11:35:34 2003 From: skip at pobox.com (Skip Montanaro) Date: Fri Jan 17 12:37:02 2003 Subject: [Spambayes] Corpus.Message.__getattr__ can't be correct can it? Message-ID: <15912.16102.713265.622424@montanaro.dyndns.org> In Corpus.Message, __getattr__ is defined as def __getattr__(self, attributeName): '''On-demand loading of the message text.''' if attributeName in ('hdrtxt', 'payload'): self.load() return getattr(self, attributeName) This has to be an infloop, right? Skip From richie at entrian.com Fri Jan 17 18:18:09 2003 From: richie at entrian.com (Richie Hindle) Date: Fri Jan 17 13:18:37 2003 Subject: [Spambayes] Re: 1.0a1 is done In-Reply-To: <3E280FED.5070305@videotron.ca> References: <3E280FED.5070305@videotron.ca> Message-ID: <7jhg2v8c91f582f1mm71smk7vrkff7btt3@4ax.com> > 2- When I try to run pop3graph.py. I get this error message I'll add this to my ever-growing list of things to do... > 3- This is not a problem with the release but I will ask > I'm running pop3proxy since a while so I have accumulated some ham and > spam that pop3proxy saved in the cache file. > How can I make the new pop3proxy aware of those file. You can upload them into the web interface, via the "Train on a given message" form (which I should probably need to rename, now that it supports mbox files). > If I only copy the cache file or define the values > pop3proxy_spam_cache: D:/Remi_NoBackup/MailFilter/pop3proxy-spam-cache > pop3proxy_ham_cache: D:/Remi_NoBackup/MailFilter/pop3proxy-ham-cache > pop3proxy_unknown_cache: D:/Remi_NoBackup/MailFilter/pop3proxy-unknown-cache Ooo, don't do that! Those caches need to be directories - things will get very confused if you make them files. -- Richie Hindle richie@entrian.com From richie at entrian.com Fri Jan 17 18:50:54 2003 From: richie at entrian.com (Richie Hindle) Date: Fri Jan 17 13:51:22 2003 Subject: [Spambayes] Corpus.Message.__getattr__ can't be correct can it? In-Reply-To: <15912.16102.713265.622424@montanaro.dyndns.org> References: <15912.16102.713265.622424@montanaro.dyndns.org> Message-ID: Hi Skip, > In Corpus.Message, __getattr__ is defined as > > def __getattr__(self, attributeName): > '''On-demand loading of the message text.''' > > if attributeName in ('hdrtxt', 'payload'): > self.load() > return getattr(self, attributeName) > > This has to be an infloop, right? It should probably be: return self.__dict__[attributeName] so that it raises an exception when something goes wrong. This is probably related to https://sourceforge.net/tracker/?func=detail&atid=498103&aid=651365&group_id=61702 The suggested fix in the bug report looks needlessly destructive to me - I'd use something like (untested) if bmatch: self.payload = bmatch.group(2) self.hdrtxt = sub[:bmatch.start(2)] else: self.payload = sub self.hdrtxt = "" Skip, since you're having trouble with this and I can't reproduce it, could you try the above edit? Tim S if you're listening, any better ideas? -- Richie Hindle richie@entrian.com From tim.one at comcast.net Fri Jan 17 14:12:48 2003 From: tim.one at comcast.net (Tim Peters) Date: Fri Jan 17 14:14:09 2003 Subject: [Spambayes] FYI: Java implementation In-Reply-To: <1YUQA7UQYT421W2UZVGAGBDBIGVQVT51.3e281b0a@myst> Message-ID: [TimP] > Less spam means more time for fun. Too bad I was kicked off the > project . [Skip Montanaro] > That's what you get for having too much fun. Barry got jealous. (Those > bass players are a jealous lot you know.) ;-) [TimS] > Ya, and all for what...? Using two fingers at a time to play one > note at a time? - TimS That's on Barry's best day. Usually he plays about one note per minute, due to heavy drool landing on a string. Sometimes he uses a finger to try to push the stream into position, though, so he's more advanced than most bass players. we-only-hire-the-best-ly y'rs - tim From tim at fourstonesExpressions.com Fri Jan 17 13:14:20 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Fri Jan 17 14:14:57 2003 Subject: [Spambayes] Corpus.Message.__getattr__ can't be correct can it? In-Reply-To: <15912.16102.713265.622424@montanaro.dyndns.org> Message-ID: I'm not sure how this ever worked! Unfortunately, I'm in the middle of changing workstations right now, and don't have cvs up and running yet, so I can't fix it... 1/17/2003 11:35:34 AM, Skip Montanaro wrote: >In Corpus.Message, __getattr__ is defined as > > def __getattr__(self, attributeName): > '''On-demand loading of the message text.''' > > if attributeName in ('hdrtxt', 'payload'): > self.load() > return getattr(self, attributeName) > >This has to be an infloop, right? > >Skip > >_______________________________________________ >Spambayes mailing list >Spambayes@python.org >http://mail.python.org/mailman/listinfo/spambayes > > c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org From just at letterror.com Fri Jan 17 20:29:15 2003 From: just at letterror.com (Just van Rossum) Date: Fri Jan 17 14:29:33 2003 Subject: [Spambayes] Corpus.Message.__getattr__ can't be correct can it? In-Reply-To: Message-ID: Richie Hindle wrote: > > In Corpus.Message, __getattr__ is defined as > > > > def __getattr__(self, attributeName): > > '''On-demand loading of the message text.''' > > > > if attributeName in ('hdrtxt', 'payload'): > > self.load() > > return getattr(self, attributeName) > > > > This has to be an infloop, right? > > It should probably be: > > return self.__dict__[attributeName] > > so that it raises an exception when something goes wrong. [ ... ] Neither makes sense (unless I'm missing some magic context): __getattr__ is only called if the attr isn't found the normal way, which means it's for sure not in self.__dict__. Just From just at letterror.com Fri Jan 17 20:54:57 2003 From: just at letterror.com (Just van Rossum) Date: Fri Jan 17 14:55:07 2003 Subject: [Spambayes] Corpus.Message.__getattr__ can't be correct can it? In-Reply-To: Message-ID: Just van Rossum wrote: [never mind what I wrote... self.load() obviously loads the right attrs] That said, yeah, looking it up in self.__dict__ is better, but you must then catch KeyError and raise AttributeError instead. Just From tim at fourstonesExpressions.com Fri Jan 17 14:03:44 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Fri Jan 17 15:04:29 2003 Subject: [Spambayes] Corpus.Message.__getattr__ can't be correct can it? In-Reply-To: Message-ID: 1/17/2003 1:29:15 PM, Just van Rossum wrote: >Richie Hindle wrote: > >> > In Corpus.Message, __getattr__ is defined as >> > >> > def __getattr__(self, attributeName): >> > '''On-demand loading of the message text.''' >> > >> > if attributeName in ('hdrtxt', 'payload'): >> > self.load() >> > return getattr(self, attributeName) >> > >> > This has to be an infloop, right? >> >> It should probably be: >> >> return self.__dict__[attributeName] >> >> so that it raises an exception when something goes wrong. [ ... ] > >Neither makes sense (unless I'm missing some magic context): __getattr__ >is only called if the attr isn't found the normal way, which means it's >for sure not in self.__dict__. It's not an infloop if self.load() sets the attributes hdrtxt and payload, AND attributeName is in ('hdrtxt', 'payload'). Obviously both of these conditions are not being met. self.load() ends up calling self.setSubstance, which does nothing if the message substance cannot be split into header text and payload. This is an error. setSubstance should look like: def setSubstance(self, sub): '''set this message substance''' bodyRE = re.compile(r"\r?\n(\r?\n)(.*)", re.DOTALL+re.MULTILINE) bmatch = bodyRE.search(sub) if bmatch: self.payload = bmatch.group(2) self.hdrtxt = sub[:bmatch.start(2)] else: self.payload = sub #we don't have valid headers, only payload self.hdrtxt = '' and __getattr__ should look like: def __getattr__(self, attributeName): '''On-demand loading of the message text.''' if attributeName in ('hdrtxt', 'payload'): self.load() # will recurse if load does not set hdrtxt or payload return getattr(self, attributeName) else # we should never get here. if we do, some attribute is missing # and we don't know what to do about it raise AttributeError, attributeName Unfortunately, I don't have cvs access to fix this at the moment. > >Just > >_______________________________________________ >Spambayes mailing list >Spambayes@python.org >http://mail.python.org/mailman/listinfo/spambayes > > c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org From skip at pobox.com Fri Jan 17 14:06:52 2003 From: skip at pobox.com (Skip Montanaro) Date: Fri Jan 17 15:07:27 2003 Subject: [Spambayes] Corpus.Message.__getattr__ can't be correct can it? In-Reply-To: References: <15912.16102.713265.622424@montanaro.dyndns.org> Message-ID: <15912.25180.638042.95791@montanaro.dyndns.org> Richie> It should probably be: Richie> return self.__dict__[attributeName] Richie> so that it raises an exception when something goes wrong. This Richie> is probably related to Richie> https://sourceforge.net/tracker/?func=detail&atid=498103&aid=651365&group_id=61702 Yes, now that I know what's going on, I understand why I was getting infinite loops. The __getattr__ method is really only meant to initialize payload and hdrtxt. Any other attributes should raise AttributeError. I corrected the code in Corpus.py and closed out the bug report. Skip From skip at pobox.com Fri Jan 17 14:10:34 2003 From: skip at pobox.com (Skip Montanaro) Date: Fri Jan 17 15:10:45 2003 Subject: [Spambayes] Corpus.Message.__getattr__ can't be correct can it? In-Reply-To: References: Message-ID: <15912.25402.44471.77970@montanaro.dyndns.org> Just> Neither makes sense (unless I'm missing some magic context): Just> __getattr__ is only called if the attr isn't found the normal way, Just> which means it's for sure not in self.__dict__. Well, I think Richie meant it should be: def __getattr__(self, attributeName): '''On-demand loading of the message text.''' if attributeName in ('hdrtxt', 'payload'): self.load() try: return self.__dict__[attributeName] except KeyError: raise AttributeError, attributeName That is, __getattr__ is called when hdrtxt or payload are accessed but not yet initialized. All other accesses (or if self.load() fails somehow) should raise AttributeError. See Corpus.py 1.3. Skip From noreply at sourceforge.net Fri Jan 17 12:09:18 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Fri Jan 17 15:16:20 2003 Subject: [Spambayes] [ spambayes-Bugs-651365 ] getattr recursion in Corpus.py Message-ID: Bugs item #651365, was opened at 2002-12-10 04:42 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=651365&group_id=61702 Category: None Group: None >Status: Closed >Resolution: Fixed Priority: 5 Submitted By: Wolfgang Strobl (strobl) Assigned to: Tim Stone (timstone4) Summary: getattr recursion in Corpus.py Initial Comment: After feeding a bunch of new messages into pop3proxy, classifying them and when trying to save the result, I got a recursion loop (followed by recursion depth exceeded) in \cvshome\spambayes\Corpus.py|__getattr__|269] After looking into setSubstance, I noticed that setSubstance (called by load) only sets the attributes payload and hdrtext when the pattern matches. I temporarily added an else clause to bmatch, i.e. if bmatch: self.payload = bmatch.group(2) self.hdrtxt = sub[:bmatch.start(2)] print ".", else: self.payload = "nix\r\n" self.hdrtxt="nix\r\n" print "?", len(sub), and indeed, when trying to save, I notice that after about 800 good messages, ~ 100 have an empty message, see the output below. I don't really know what I'm doing here, but at this fix at least allows me to continue. ------------------------- C:\archiv\cvshome\spambayes>python -u pop3proxy.py - l 8110 mail.gmd.de Loading database... Done. Listener on port 8110 is proxying mail:110 User interface url is http://localhost:8880 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 . . . . . . . . . . . . . . . ----------------------- Initial traceback: error: uncaptured python exception, closing channel <__main__.UserInterface conn ected at 0x2213470> (exceptions.RuntimeError:maximum recursion depth exceeded [C :\Python22\lib\asyncore.py|poll|95] [C:\Python22 \lib\asyncore.py|handle_read_eve nt|392] [C:\Python22\lib\asynchat.py|handle_read|112] [C:\archiv\cvshome\spambay es\pop3proxy.py|found_terminator|804] [C:\archiv\cvshome\spambayes\pop3proxy.py| onRequest|830] [C:\archiv\cvshome\spambayes\pop3proxy.py|onReview|1 093] [C:\arch iv\cvs\spambayes\Corpus.py|takeMessage|188] [C:\archiv\cvs\spambayes\FileCorpus. py|addMessage|140] [C:\archiv\cvs\spambayes\FileCorpus.py|store|231] [C:\archiv\ cvs\spambayes\Corpus.py|getSubstance|318] [C:\archiv\cvs\spambayes\Corpus.py|__g etattr__|269] [C:\archiv\cvs\spambayes\Corpus.py|__getattr__|269] [C:\archiv\cvs \spambayes\Corpus.py|__getattr__|269] [C:\archiv\cvs\spambayes\Corpus.py|__getat ---------------------------------------------------------------------- >Comment By: Skip Montanaro (montanaro) Date: 2003-01-17 14:09 Message: Logged In: YES user_id=44345 Fixed by restricting __getattr__ (make it raise AttributeError at appropriate times) and handle the case here where the message text isn't formatted as expected. See Corpus.py 1.3. ---------------------------------------------------------------------- Comment By: Skip Montanaro (montanaro) Date: 2003-01-16 10:39 Message: Logged In: YES user_id=44345 Assigning to Tim Stone. I think this is the same problem I reported on the list the other day. I think the offending code is in Corpus.__getitem__. The test of amsg - "if not amsg" should be "if amsg is None" I think. I suspect a fix further up the line as the OP indicated would probably do the trick. If you don't do something to set self.hdrtxt I believe it is None and you infloop trying to resolve a non-existent __nonzero__ method. Something like that. ;-) ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=651365&group_id=61702 From richie at entrian.com Fri Jan 17 20:21:36 2003 From: richie at entrian.com (Richie Hindle) Date: Fri Jan 17 15:22:08 2003 Subject: [Spambayes] New POP3 proxy and web interface Message-ID: <5apg2voqi3c5s01ib4b5kstgjvvfhl85ni@4ax.com> Hello all, For those not subscribed to the checkins-list: You can now run pop3proxy.py with no POP3 servers, and just get the web interface. I'll split it into different source files at some point so that the naming is more sensible. This should let Skip use it instead of his proxytrainer.py. Time Stone's web-based configurator is now a part of the main web interface. The fact that you can run the thing without any POP3 proxies set up, and that the config page is now a part of it, means that you don't need to touch bayescustomize.ini, even when starting from scratch. Run pop3proxy.py, hit the Configuration link, enter your POP3 details, and you're away. There's a new architecture for pop3proxy and the web interface. The HTML is now all in resources/ui.html, with the pieces being pulled out and stitched together at runtime. All the socket/async code has been pulled out into a library module, so there's only application code left in pop3proxy.py (it's still a combination of web UI and POP3 proxy, which I'll address RSN). I've added a new directory 'resources' for the HTML and GIFs. These are packaged using Mike Fletcher's excellent ResourcePackage tool, but you don't need to know about that, or have ResourcePackage installed, unless you want to change the resources. I've added a new option html_ui_allow_remote_connections, which can be set to False to provide some measure of privacy (I'm loath to say 'security' for fear of bugs 8-) I've also added some pretty icons to the web interface, because I couldn't help myself. -- Richie Hindle richie@entrian.com From richie at entrian.com Fri Jan 17 20:21:39 2003 From: richie at entrian.com (Richie Hindle) Date: Fri Jan 17 15:22:12 2003 Subject: [Spambayes] proxytrainer.py and proxytee.py are checked in In-Reply-To: <15910.61389.133887.569308@montanaro.dyndns.org> References: <15910.61389.133887.569308@montanaro.dyndns.org> Message-ID: Hi Skip, > I just checked in proxytrainer.py and proxytee.py. The former is > essentially pop3proxy.py with the POP stuff removed. I've just checked in a version of pop3proxy.py that can run with no POP3 servers configured, so it just provides the web interface. This should let you use it instead of your (hopefully interim!) pop3trainer.py - just move your onUpload method into it. You should really make your message-naming code use the same system as everything else - the names are unix timestamps of when each messages was received, and are used to paginate the training pages into one day per page (by day received rather than potentially-broken Date header). If you want me to do that then let me know, but I have an ever-growing to-do list... > A bit further down the road, I will probably dump the asyncore stuff in > favor of something based on SimpleHTTPServer just to reduce the number of > lines of code. Without the POP stuff going on there's no great need for the > channel multiplexing. If I can persuade you to use pop3proxy (or its successor, a generic Spambayes server that can optionally host either or both of the web UI and the POP3 proxy), you won't need to pull out the async stuff. And all the async-related code is now refactored into a separate module now, so pop3proxy.py is a good deal smaller than it was. It'll be smaller still when the core server, POP3 proxy, and web UI parts are all separated. I'm trying to unify the servers we have (eg. my latest edits make Tim Stone's OptionConfig.py a part of pop3proxy.py - again, ignore the bad naming, I'm going to fix that - I'm doing it in stages to make CVS remain useful). I'd rather other people didn't fork off new servers at the same time as I'm trying to unify them! -- Richie Hindle richie@entrian.com From richie at entrian.com Fri Jan 17 20:21:42 2003 From: richie at entrian.com (Richie Hindle) Date: Fri Jan 17 15:22:15 2003 Subject: [Spambayes] pop3proxy.UserInterface.onSave - self.shutdown? In-Reply-To: <15910.4104.787891.400893@montanaro.dyndns.org> References: <15910.4104.787891.400893@montanaro.dyndns.org> Message-ID: > Pychecker complains about the call to self.shutdown(2) on line 1441 of > pop3proxy.py. It should probably be self.socket.shutdown(2), but I'll let > someone else who knows the code better verify that. asyncore is using __getattr__ to proxy unknown method calls to the underlying socket. I've changed it anyway, to keep PyChecker happy. -- Richie Hindle richie@entrian.com From tim at fourstonesExpressions.com Fri Jan 17 14:39:41 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Fri Jan 17 15:40:21 2003 Subject: [Spambayes] spamconference webcasts In-Reply-To: <15912.12331.478178.788727@montanaro.dyndns.org> Message-ID: <41FCE97ZV75Y3V961YOKNHZ1X32EA.3e286a0d@myst> This µ$0phhhhht dude is kinda fulla somethin... - TimS 1/17/2003 10:32:43 AM, Skip Montanaro wrote: > > Anthony> I just wish they would use fonts that are readable through the > Anthony> webcast... > >In all fairness, I wonder if they knew it would be webcast? (One would >think Paul Graham ought to have known.) > >S > > c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org From tim.one at comcast.net Fri Jan 17 15:51:54 2003 From: tim.one at comcast.net (Tim Peters) Date: Fri Jan 17 15:53:27 2003 Subject: [Spambayes] Stemming and stopword elemination In-Reply-To: <20030117134741.1e88011c.Alexander@Leidinger.net> Message-ID: [Alexander Leidinger] > has someone already experimented with Information Retrieval techniques > like stopword elemination (stopwords: the, a, an, or, and, ...) and word > stemming? Yes and no. Stopword elimination doesn't make sense here. A typical IR application requires space proportional to the number of times a word appears, but this app doesn't: one word == one database entry, no matter how many times the word appears. Identifying stopwords would complicate and slow the code, and introduce language dependence, for a trivial database savings. Some Classic Bayesian classifiers remove stopwords for another reason (related to one discussed below), but that reason doesn't make sense in this code either: when scoring, the classifier automatically ignores words with a spamprob close to 0.5, so stopwords that truly *are* common across all kinds of texts have no effect on scoring. Stemming is a different issue. We not only don't stem, we don't even strip punctuation. So, e.g., "free" and "free," and "free:" and "(free" and "free--" and "free?" and "free!" and "free!!!" (etc) are all considered distinct by our tokenizer. That definitely grows the database size, but tests run both early and late in the project showed that leaving punctuation in works better than taking it out. In the literature on Classic Bayesian classifiers, better results are reported when using stemming. But they do something else very different too: a "mutual information" calculation (or moral equivalent) is done on all the training data, to identify the N words with (in effect) the greatest discriminatory power. N is typically less than 1000, and all words not in that set are completely ignored. In that context, it's very easy to believe that stemming is valuable, else minor word variations would compete with entirely different words for the privilege of not being ignored. OTOH, we ignore nothing except for tokens with spamprobs close to 0.5. > ... > I don't think this will change the failure rate significantly (maybe > better results with few training data, maybe worser; I don't expect > much change with large training data), but it should reduce the size of > the needed database. I expect that stopword elimination would make no difference, unless the stopword list contained words that are actually hammish or spammish in real life (in which case stopword elimination would hurt); the database size difference would be too small to notice. I expect that stemming would hurt period, although it would reduce database size. From skip at pobox.com Fri Jan 17 14:53:35 2003 From: skip at pobox.com (Skip Montanaro) Date: Fri Jan 17 15:53:46 2003 Subject: [Spambayes] New POP3 proxy and web interface In-Reply-To: <5apg2voqi3c5s01ib4b5kstgjvvfhl85ni@4ax.com> References: <5apg2voqi3c5s01ib4b5kstgjvvfhl85ni@4ax.com> Message-ID: <15912.27983.607928.417089@montanaro.dyndns.org> Richie> You can now run pop3proxy.py with no POP3 servers, and just get Richie> the web interface. I'll split it into different source files at Richie> some point so that the naming is more sensible. This should let Richie> Skip use it instead of his proxytrainer.py. This is great! I just checked in a number of changes to proxytrainer.py. Looks like it's time to backport them to pop3proxy.py. Skip From skip at pobox.com Fri Jan 17 14:58:30 2003 From: skip at pobox.com (Skip Montanaro) Date: Fri Jan 17 15:58:39 2003 Subject: [Spambayes] proxytrainer.py and proxytee.py are checked in In-Reply-To: References: <15910.61389.133887.569308@montanaro.dyndns.org> Message-ID: <15912.28278.137619.916136@montanaro.dyndns.org> Richie> You should really make your message-naming code use the same Richie> system as everything else - the names are unix timestamps of Wasn't aware I did anything differently than you. Did you notice something? Richie> when each messages was received, and are used to paginate the Richie> training pages into one day per page (by day received rather Richie> than potentially-broken Date header). If you want me to do that Richie> then let me know, but I have an ever-growing to-do list... I think as important (or more important) than day-by-day display is chunk-by-chunk display. I get far too much mail to want to review it all at once anyway. If I can't take the time to train everything, I don't want to be depressed about it. ;-) >> A bit further down the road, I will probably dump the asyncore stuff >> in favor of something based on SimpleHTTPServer just to reduce the >> number of lines of code. Without the POP stuff going on there's no >> great need for the channel multiplexing. Richie> If I can persuade you to use pop3proxy (or its successor, a Richie> generic Spambayes server that can optionally host either or both Richie> of the web UI and the POP3 proxy), you won't need to pull out Richie> the async stuff. That's fine. My only worry is that the async code will never be as well exercised as SimpleHTTPServer. Skip From just at letterror.com Fri Jan 17 22:01:30 2003 From: just at letterror.com (Just van Rossum) Date: Fri Jan 17 16:02:00 2003 Subject: [Spambayes] Corpus.Message.__getattr__ can't be correct can it? In-Reply-To: <15912.25402.44471.77970@montanaro.dyndns.org> Message-ID: Skip Montanaro wrote: > Well, I think Richie meant it should be: > > def __getattr__(self, attributeName): > '''On-demand loading of the message text.''' > > if attributeName in ('hdrtxt', 'payload'): > self.load() > try: > return self.__dict__[attributeName] > except KeyError: > raise AttributeError, attributeName > > That is, __getattr__ is called when hdrtxt or payload are accessed > but not yet initialized. All other accesses (or if self.load() fails > somehow) should raise AttributeError. See Corpus.py 1.3. Yeah, what I wrote was nonsense. But while we're nitpicking, the _real_ intent of the code is probably this: def __getattr__(self, attributeName): '''On-demand loading of the message text.''' if attributeName in ('hdrtxt', 'payload'): self.load() return self.__dict__[attributeName] raise AttributeError, attributeName This is assuming self.load() _always_ sets those two attrs. Back to lurk mode... Just From skip at pobox.com Fri Jan 17 15:19:48 2003 From: skip at pobox.com (Skip Montanaro) Date: Fri Jan 17 16:20:10 2003 Subject: [Spambayes] Stemming and stopword elemination In-Reply-To: References: <20030117134741.1e88011c.Alexander@Leidinger.net> Message-ID: <15912.29556.507000.951582@montanaro.dyndns.org> >> has someone already experimented with Information Retrieval >> techniques like stopword elemination (stopwords: the, a, an, or, and, >> ...) and word stemming? Tim> Yes and no. Tim> Stemming is a different issue. We not only don't stem, we don't Tim> even strip punctuation. Well, mostly. In the usual linguistic sense spambayes doesn't stem, however the tokenizer does collapse some things. Long strings are compressed to something like "skip b 40" where 'b' is the first letter and '40' is the length of the string (or the number of characters elided). In the email prefix stuff I checked in and the suffix stuff I am still pondering, I generate tokens like pfxlen:%d up to some small threshold value. Above that, I just generate "pflen:big" or "sfxlen:big". Otherwise, I'd have a number of tokens in my database with keys of "pfxlen:N" (where is is a "biggish" number) and a value of (1,0) (spammy hapaxes - seen once in spam and never in ham). Skip From skip at pobox.com Fri Jan 17 16:30:08 2003 From: skip at pobox.com (Skip Montanaro) Date: Fri Jan 17 17:30:18 2003 Subject: [Spambayes] OptionConfig.py - split into two pieces? Message-ID: <15912.33776.619638.320031@montanaro.dyndns.org> In pop3proxy.py I see from OptionConfig import OptionsConfigurator but OptionConfig.py is at the top level (not in the spambayes package) and isn't installed. It looks like both a module and a script. Perhaps it should be split in two pieces, a script and an importable module. Skip From frank.horowitz at csiro.au Sat Jan 18 11:06:06 2003 From: frank.horowitz at csiro.au (Frank Horowitz) Date: Fri Jan 17 22:07:00 2003 Subject: [Spambayes] Sourceforge :pserver cvs access broken... (FIXED) In-Reply-To: <15912.5783.216516.749029@montanaro.dyndns.org> References: <1042773411.22390.7.camel@bonzo.ned.dem.csiro.au> <15912.5783.216516.749029@montanaro.dyndns.org> Message-ID: <1042859167.1792.1.camel@amdo> On Fri, 2003-01-17 at 22:43, Skip Montanaro wrote: > > Frank> Does anyone (by any small miracle) have a mirror of the cvs tree > Frank> that they'd be willing to put online while SF gets it's act > Frank> together? > > Not a mirror, but I just put a gzipped tar file snapshot at > > http://www.musi-cal.com/~skip/python.spambayes.tar.gz > > I'd be happy to update it periodically, though I have to do it manually, > since on that machine cvs prompts me for my SF password when I 'cvs up'. > Thanks to all of the kind souls out there who jumped into the SF breach! SF is now serving cvs via both :pserver and the web again. Frank From francois.granger at free.fr Sat Jan 18 15:22:49 2003 From: francois.granger at free.fr (=?iso-8859-1?Q?Fran=E7ois?= Granger) Date: Sat Jan 18 09:22:55 2003 Subject: [Spambayes] Fresh download Message-ID: I downloaded the nightly build this morning. I copied my current bayescustomize.ini in the new directory. First try give this message: ================================================= [fbg:/volumes/OS99/spambayes-2003-01-17] fgranger% python OptionConfig.py config file has unknown option 'spam_cutoff' in section 'TestDriver' config file has unknown option 'ham_cutoff' in section 'TestDriver' Traceback (most recent call last): File "OptionConfig.py", line 32, in ? from spambayes.Options import options File "spambayes/Options.py", line 542, in ? options.mergefiles(['bayescustomize.ini']) File "spambayes/Options.py", line 496, in mergefiles self._update() File "spambayes/Options.py", line 523, in _update raise ValueError("errors while parsing .ini file") ValueError: errors while parsing .ini file ================================================= So I remove these two option..... Next try gave this: ================================================= [fbg:/volumes/OS99/spambayes-2003-01-17] fgranger% python OptionConfig.py Serving HTTP on 0.0.0.0 port 8000 ... localhost - - [18/Jan/2003 15:15:33] "GET / HTTP/1.1" 200 - ---------------------------------------- Exception happened during processing of request from ('127.0.0.1', 49809) Traceback (most recent call last): File "/BinaryCache/python/python-3.root~193/usr/lib/python2.2/SocketServer.py", line 221, in handle_request self.process_request(request, client_address) File "/BinaryCache/python/python-3.root~193/usr/lib/python2.2/SocketServer.py", line 240, in process_request self.finish_request(request, client_address) File "/BinaryCache/python/python-3.root~193/usr/lib/python2.2/SocketServer.py", line 253, in finish_request self.RequestHandlerClass(request, client_address, self) File "/BinaryCache/python/python-3.root~193/usr/lib/python2.2/SocketServer.py", line 514, in __init__ self.handle() File "/BinaryCache/python/python-3.root~193/usr/lib/python2.2/BaseHTTPServer.py", line 266, in handle method() File "/BinaryCache/python/python-3.root~193/usr/lib/python2.2/SimpleHTTPServer.py", line 41, in do_GET f = self.send_head() File "SmarterHTTPServer.py", line 100, in send_head retstr = getattr(self, methname)(pdict) File "OptionConfig.py", line 84, in homepage parm_ini_map[httpparm][PIMapOpt])) File "/BinaryCache/python/python-3.root~193/usr/lib/python2.2/ConfigParser.py", line 279, in get raise NoOptionError(option, section) NoOptionError: No option `spam_cutoff' in section: TestDriver -- Recently using MacOSX....... From francois.granger at free.fr Sat Jan 18 15:33:38 2003 From: francois.granger at free.fr (=?iso-8859-1?Q?Fran=E7ois?= Granger) Date: Sat Jan 18 09:33:44 2003 Subject: [Spambayes] Success and failure Message-ID: I tried pop3proxy on one account, it worked like a charm. But failed after getting the mails with segmentation fault (see at end). I got some questions: On MacOS X, it seems that pop3proxy _must_ run with sudo. Is there any other possibility to launch it ? [fbg:/volumes/OS99/spambayes-2003-01-17] fgranger% python pop3proxy.py Loading database... Done. Traceback (most recent call last): File "pop3proxy.py", line 1650, in ? run() File "pop3proxy.py", line 1644, in run main(state.servers, state.proxyPorts, state.uiPort, state.launchUI) File "pop3proxy.py", line 1349, in main BayesProxyListener(server, serverPort, proxyPort) File "pop3proxy.py", line 399, in __init__ Listener.__init__(self, proxyPort, BayesProxy, proxyArgs) File "pop3proxy.py", line 178, in __init__ self.bind(('', port)) File "/usr/lib/python2.2/asyncore.py", line 306, in bind return self.socket.bind (addr) socket.error: (13, 'Permission denied') [fbg:/volumes/OS99/spambayes-2003-01-17] fgranger% sudo python pop3proxy.py Password: Loading database... Done. Listener on port 110 is proxying pop.nerim.net:110 User interface url is http://localhost:8880 Segmentation fault -- Recently using MacOSX....... From francois.granger at free.fr Sat Jan 18 16:00:08 2003 From: francois.granger at free.fr (=?iso-8859-1?Q?Fran=E7ois?= Granger) Date: Sat Jan 18 10:00:13 2003 Subject: [Spambayes] Follow up Message-ID: There is kind of a problem. It may be specific to MacOS X, but I think that pop3proxy should filter on filenames since it may grab incorrect files. MacOS X create in each directory a file named ".DS_Store" for it own uses. Since it is a hidden file, there is no issue with most software. But pop3proxy loads it as if it was a normal message file. ======================================== [fbg:/Volumes/OS99/spambayes-2003-01-17] fgranger% sudo python pop3proxy.py Loading database... Loading state from /Volumes/OS99/spambayesf/hammie.db database /Volumes/OS99/spambayesf/hammie.db is an existing database, with 99 spam and 29 ham Done. placing .DS_Store in corpus cache <- this is a serious problem ;-) BayesProxyListener listening on port 110. Listener on port 110 is proxying pop.nerim.net:110 UserInterfaceListener listening on port 8880. User interface url is http://localhost:8880 ======================================== After clicking on the "Review message" link in the main page of pop3proxy, the terminal display the following lines. ======================================== adding 1042901588 to corpus storing 1042901588 adding message 1042901588 to corpus placing 1042901588 in corpus cache adding 1042901588-2 to corpus storing 1042901588-2 adding message 1042901588-2 to corpus placing 1042901588-2 in corpus cache error: uncaptured python exception, closing channel <__main__.UserInterface connected at 0x276450> (exceptions.ValueError:invalid literal for long(): .DS_Store [/BinaryCache/python/python-3.root~193/usr/lib/python2.2/asyncore.py|poll|94] [/BinaryCache/python/python-3.root~193/usr/lib/python2.2/asyncore.py|handle_read_event|389] [/BinaryCache/python/python-3.root~193/usr/lib/python2.2/asynchat.py|handle_read|130] [pop3proxy.py|found_terminator|811] [pop3proxy.py|onRequest|837] [pop3proxy.py|onReview|1143] [pop3proxy.py|buildReviewKeys|1020] [pop3proxy.py|keyToTimestamp|976]) ======================================== -- Recently using MacOSX....... From skip at pobox.com Sat Jan 18 09:02:54 2003 From: skip at pobox.com (Skip Montanaro) Date: Sat Jan 18 10:02:59 2003 Subject: [Spambayes] Fresh download In-Reply-To: References: Message-ID: <15913.27806.863804.858968@montanaro.dyndns.org> Fran?ois> config file has unknown option 'spam_cutoff' in section 'TestDriver' Fran?ois> config file has unknown option 'ham_cutoff' in section 'TestDriver' These two now go in the new [Categorization] section. Skip From francois.granger at free.fr Sat Jan 18 16:21:27 2003 From: francois.granger at free.fr (=?iso-8859-1?Q?Fran=E7ois?= Granger) Date: Sat Jan 18 10:21:34 2003 Subject: [Spambayes] Fresh download In-Reply-To: <15913.27806.863804.858968@montanaro.dyndns.org> References: <15913.27806.863804.858968@montanaro.dyndns.org> Message-ID: At 09:02 -0600 18/01/2003, in message Re: [Spambayes] Fresh download, Skip Montanaro wrote: > Fran?ois> config file has unknown option 'spam_cutoff' in >section 'TestDriver' > Fran?ois> config file has unknown option 'ham_cutoff' in section >'TestDriver' > >These two now go in the new [Categorization] section. Thanks, but if I remove them from my ini files, they should get a default value. This is done in Option.py. But it did not worked for me as stated in the second part of my previous message: >So I remove these two option..... >Next try gave this: > >================================================= >[fbg:/volumes/OS99/spambayes-2003-01-17] fgranger% python OptionConfig.py >Serving HTTP on 0.0.0.0 port 8000 ... >localhost - - [18/Jan/2003 15:15:33] "GET / HTTP/1.1" 200 - >---------------------------------------- >Exception happened during processing of request from ('127.0.0.1', 49809) >Traceback (most recent call last): > File >"/BinaryCache/python/python-3.root~193/usr/lib/python2.2/SocketServer.py", >line 221, in handle_request > self.process_request(request, client_address) > File >"/BinaryCache/python/python-3.root~193/usr/lib/python2.2/SocketServer.py", >line 240, in process_request > self.finish_request(request, client_address) > File >"/BinaryCache/python/python-3.root~193/usr/lib/python2.2/SocketServer.py", >line 253, in finish_request > self.RequestHandlerClass(request, client_address, self) > File >"/BinaryCache/python/python-3.root~193/usr/lib/python2.2/SocketServer.py", >line 514, in __init__ > self.handle() > File >"/BinaryCache/python/python-3.root~193/usr/lib/python2.2/BaseHTTPServer.py", >line 266, in handle > method() > File >"/BinaryCache/python/python-3.root~193/usr/lib/python2.2/SimpleHTTPServer.py", >line 41, in do_GET > f = self.send_head() > File "SmarterHTTPServer.py", line 100, in send_head > retstr = getattr(self, methname)(pdict) > File "OptionConfig.py", line 84, in homepage > parm_ini_map[httpparm][PIMapOpt])) > File >"/BinaryCache/python/python-3.root~193/usr/lib/python2.2/ConfigParser.py", >line 279, in get > raise NoOptionError(option, section) >NoOptionError: No option `spam_cutoff' in section: TestDriver > > -- Recently using MacOSX....... From tim at fourstonesExpressions.com Sat Jan 18 09:23:39 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Sat Jan 18 10:24:19 2003 Subject: [Spambayes] Fresh download In-Reply-To: Message-ID: Looks like we might could use a migration script? We certainly ought to keep this in mind for future releases... On another note, I finally was able to get cvs workin on this new machine, so I'm back in business :) - TimS 1/18/2003 9:21:27 AM, Fran?ois Granger wrote: >At 09:02 -0600 18/01/2003, in message Re: [Spambayes] Fresh download, >Skip Montanaro wrote: >> Fran?ois> config file has unknown option 'spam_cutoff' in >>section 'TestDriver' >> Fran?ois> config file has unknown option 'ham_cutoff' in section >>'TestDriver' >> >>These two now go in the new [Categorization] section. > >Thanks, but if I remove them from my ini files, they should get a >default value. This is done in Option.py. > >But it did not worked for me as stated in the second part of my >previous message: > >>So I remove these two option..... >>Next try gave this: >> >>================================================= >>[fbg:/volumes/OS99/spambayes-2003-01-17] fgranger% python OptionConfig.py >>Serving HTTP on 0.0.0.0 port 8000 ... >>localhost - - [18/Jan/2003 15:15:33] "GET / HTTP/1.1" 200 - >>---------------------------------------- >>Exception happened during processing of request from ('127.0.0.1', 49809) >>Traceback (most recent call last): >> File >>"/BinaryCache/python/python-3.root~193/usr/lib/python2.2/SocketServer.py", >>line 221, in handle_request >> self.process_request(request, client_address) >> File >>"/BinaryCache/python/python-3.root~193/usr/lib/python2.2/SocketServer.py", >>line 240, in process_request >> self.finish_request(request, client_address) >> File >>"/BinaryCache/python/python-3.root~193/usr/lib/python2.2/SocketServer.py", >>line 253, in finish_request >> self.RequestHandlerClass(request, client_address, self) >> File >>"/BinaryCache/python/python-3.root~193/usr/lib/python2.2/SocketServer.py", >>line 514, in __init__ >> self.handle() >> File >>"/BinaryCache/python/python-3.root~193/usr/lib/python2.2/BaseHTTPServer.py", >>line 266, in handle >> method() >> File >>"/BinaryCache/python/python-3.root~ 193/usr/lib/python2.2/SimpleHTTPServer.py", >>line 41, in do_GET >> f = self.send_head() >> File "SmarterHTTPServer.py", line 100, in send_head >> retstr = getattr(self, methname)(pdict) >> File "OptionConfig.py", line 84, in homepage >> parm_ini_map[httpparm][PIMapOpt])) >> File >>"/BinaryCache/python/python-3.root~193/usr/lib/python2.2/ConfigParser.py", >>line 279, in get >> raise NoOptionError(option, section) >>NoOptionError: No option `spam_cutoff' in section: TestDriver >> >> > >-- >Recently using MacOSX....... > >_______________________________________________ >Spambayes mailing list >Spambayes@python.org >http://mail.python.org/mailman/listinfo/spambayes > > c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org From skip at pobox.com Sat Jan 18 09:59:52 2003 From: skip at pobox.com (Skip Montanaro) Date: Sat Jan 18 10:59:58 2003 Subject: [Spambayes] Fresh download In-Reply-To: References: <15913.27806.863804.858968@montanaro.dyndns.org> Message-ID: <15913.31224.984560.906915@montanaro.dyndns.org> >> These two now go in the new [Categorization] section. Fran?ois> Thanks, but if I remove them from my ini files, they should Fran?ois> get a default value. This is done in Option.py. >> NoOptionError: No option `spam_cutoff' in section: TestDriver They are somehow still winding up in the [TestDriver] section. Make sure you aren't importing an old version of Options.py and don't have another .ini file which is getting loaded. Skip From mwh at python.net Sat Jan 18 17:05:42 2003 From: mwh at python.net (Michael Hudson) Date: Sat Jan 18 12:05:50 2003 Subject: [Spambayes] Re: Corpus.Message.__getattr__ can't be correct can it? References: <15912.16102.713265.622424@montanaro.dyndns.org> <15912.25180.638042.95791@montanaro.dyndns.org> Message-ID: <2mznpy8k2h.fsf@starship.python.net> Skip Montanaro writes: > Yes, now that I know what's going on, I understand why I was getting > infinite loops. The __getattr__ method is really only meant to initialize > payload and hdrtxt. In which case why not use a property? Cheers, M. -- ROOSTA: Ever since you arrived on this planet last night you've been going round telling people that you're Zaphod Beeblebrox, but that they're not to tell anyone else. -- The Hitch-Hikers Guide to the Galaxy, Episode 7 From skip at pobox.com Sat Jan 18 11:15:01 2003 From: skip at pobox.com (Skip Montanaro) Date: Sat Jan 18 12:15:09 2003 Subject: [Spambayes] Re: Corpus.Message.__getattr__ can't be correct can it? In-Reply-To: <2mznpy8k2h.fsf@starship.python.net> References: <15912.16102.713265.622424@montanaro.dyndns.org> <15912.25180.638042.95791@montanaro.dyndns.org> <2mznpy8k2h.fsf@starship.python.net> Message-ID: <15913.35733.325660.407255@montanaro.dyndns.org> >> The __getattr__ method is really only meant to initialize payload and >> hdrtxt. Michael> In which case why not use a property? Why? __getattr__ works fine, once it's properly written. Skip From mwh at python.net Sat Jan 18 17:22:57 2003 From: mwh at python.net (Michael Hudson) Date: Sat Jan 18 12:23:01 2003 Subject: [Spambayes] Re: Corpus.Message.__getattr__ can't be correct can it? References: <15912.16102.713265.622424@montanaro.dyndns.org> <15912.25180.638042.95791@montanaro.dyndns.org> <2mznpy8k2h.fsf@starship.python.net> <15913.35733.325660.407255@montanaro.dyndns.org> Message-ID: <2mwul28j9q.fsf@starship.python.net> Skip Montanaro writes: > >> The __getattr__ method is really only meant to initialize payload and > >> hdrtxt. > > Michael> In which case why not use a property? > > Why? __getattr__ works fine, once it's properly written. I was thinking of its performance-mangling properties. Dunno if that's an issue here, it was only an off-the-cuff remark. Cheers, M. -- ARTHUR: Why should he want to know where his towel is? FORD: Everybody should know where his towel is. ARTHUR: I think your head's come undone. -- The Hitch-Hikers Guide to the Galaxy, Episode 7 From skip at pobox.com Sat Jan 18 11:56:49 2003 From: skip at pobox.com (Skip Montanaro) Date: Sat Jan 18 12:56:59 2003 Subject: [Spambayes] Re: Corpus.Message.__getattr__ can't be correct can it? In-Reply-To: <2mwul28j9q.fsf@starship.python.net> References: <15912.16102.713265.622424@montanaro.dyndns.org> <15912.25180.638042.95791@montanaro.dyndns.org> <2mznpy8k2h.fsf@starship.python.net> <15913.35733.325660.407255@montanaro.dyndns.org> <2mwul28j9q.fsf@starship.python.net> Message-ID: <15913.38241.389238.638886@montanaro.dyndns.org> >> >> The __getattr__ method is really only meant to initialize payload >> >> and hdrtxt. >> Michael> In which case why not use a property? >> >> Why? __getattr__ works fine, once it's properly written. Michael> I was thinking of its performance-mangling properties. Dunno Michael> if that's an issue here, it was only an off-the-cuff remark. I was thinking that the simplest solution which gives correct behavior would be best. Using properties would have required me to convert Corpus.Message to a new-style class, and while that probably wouldn't have broken anything, it wasn't a direct response to the bug. Sure, __getattr__ can hurt performance, but in this case I think it's reasonable. It computes the necessary attribute values and updates them so further accesses won't call __getattr__. Skip From tony-bayes at lownds.com Sat Jan 18 12:08:03 2003 From: tony-bayes at lownds.com (Tony Lownds) Date: Sat Jan 18 15:08:22 2003 Subject: [Spambayes] Success and failure In-Reply-To: References: Message-ID: >On MacOS X, it seems that pop3proxy _must_ run with sudo. Is there >any other possibility to launch it ? Superuser privileges are always needed to bind to any port below 1000 on Unix. Are you binding to port 110? You can work around this by a) binding to a port above 1000 and b) configuring Eudora to connect to that port instead of port 110. You'll need the "Esoteric Settings" Eudora plugin installed to make that change. >[fbg:/volumes/OS99/spambayes-2003-01-17] fgranger% sudo python pop3proxy.py >Password: >Loading database... Done. >Listener on port 110 is proxying pop.nerim.net:110 >User interface url is http://localhost:8880 >Segmentation fault Did it segfault after you asked it to train messages? Raising the stack size allocated to new process before starting pop3proxy will fix this. Mac OS X has a rather small stack size by default. Try running "limit stacksize 2048" before starting pop3proxy.py BTW, I've updated the patch for binding to a specific address and posted it to Sourceforge: #670417 -Tony From noreply at sourceforge.net Sat Jan 18 08:35:13 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Sat Jan 18 17:00:16 2003 Subject: [Spambayes] [ spambayes-Bugs-669149 ] NameError in ExpiryCorpus.removeExpiredMessages Message-ID: Bugs item #669149, was opened at 2003-01-16 10:34 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=669149&group_id=61702 Category: None Group: None >Status: Closed Resolution: None Priority: 5 Submitted By: Skip Montanaro (montanaro) Assigned to: Tim Stone (timstone4) Summary: NameError in ExpiryCorpus.removeExpiredMessages Initial Comment: In verbose mode, removeExpiredMessages prints out a line which references the nonexistent variable, key. I have no idea what it should be, otherwise I'd fix it. ---------------------------------------------------------------------- >Comment By: Tim Stone (timstone4) Date: 2003-01-18 10:35 Message: Logged In: YES user_id=645698 Corrected print statement to reference msg.key() ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=669149&group_id=61702 From noreply at sourceforge.net Sat Jan 18 12:06:23 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Sat Jan 18 17:00:25 2003 Subject: [Spambayes] [ spambayes-Patches-670417 ] Allow the pop3 proxies to bind to specific addresses Message-ID: Patches item #670417, was opened at 2003-01-18 20:06 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498105&aid=670417&group_id=61702 Category: None Group: None Status: Open Resolution: None Priority: 5 Submitted By: Tony Lownds (tonylownds) Assigned to: Nobody/Anonymous (nobody) Summary: Allow the pop3 proxies to bind to specific addresses Initial Comment: This patch allows one to specify an IP address when specifying a port in the pop3proxy_ports setting. This is useful for two reasons: 1. By binding to a loopback address, the pop3proxy cannot be contacted from outside machines. Providing this option improves security. 2. The mail client Eudora - which is quite popular - is unable to specify a different POP port for different POP accounts. This patch alllows Eudora to be used with spambayes with multiple POP accounts. The implementation is fairly straightforward: any place a port was passed for binding, a pair of (address, port) is passed. In the two places a port was read (from a configuration file and from command line options), either an int or an address:int is accepted. Any place a port was turned into a string for printing, the (address, port) pair is turned into a suitable string. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498105&aid=670417&group_id=61702 From francois.granger at free.fr Sun Jan 19 01:11:12 2003 From: francois.granger at free.fr (=?iso-8859-1?Q?Fran=E7ois?= Granger) Date: Sat Jan 18 19:11:20 2003 Subject: [Spambayes] Success and failure In-Reply-To: References: Message-ID: At 12:08 -0800 18/01/2003, in message Re: [Spambayes] Success and failure, Tony Lownds wrote: >>On MacOS X, it seems that pop3proxy _must_ run with sudo. Is there >>any other possibility to launch it ? > >Superuser privileges are always needed to bind to any port below >1000 on Unix. Are you binding to port 110? > >You can work around this by a) binding to a port above 1000 and b) >configuring Eudora to connect to that port instead of port 110. >You'll need the "Esoteric Settings" Eudora plugin installed to make >that change. I see what you mean. I don't care since I am the only user of this station. It was more for prospective users. >>[fbg:/volumes/OS99/spambayes-2003-01-17] fgranger% sudo python pop3proxy.py >>Password: >>Loading database... Done. >>Listener on port 110 is proxying pop.nerim.net:110 >>User interface url is http://localhost:8880 >>Segmentation fault > >Did it segfault after you asked it to train messages? yes. > Raising the stack size allocated to new process before starting >pop3proxy will fix this. Mac OS X has a rather small stack size by >default. > >Try running "limit stacksize 2048" before starting pop3proxy.py My script is copied on yours: #!/bin/sh #clear ulimit -s 2048 sudo ifconfig lo0 inet 127.0.0.2 add sudo ifconfig lo0 inet 127.0.0.3 add sudo ifconfig lo0 inet 127.0.0.4 add cd /Volumes/OS99/spambayes/ sudo python pop3proxym.py By the way, thanks to your help, I am able to connect to 4 pop servers from Eudora. >BTW, I've updated the patch for binding to a specific address and >posted it to Sourceforge: #670417 Thanks a lot. Please, people with commit rights, validate this patch so that I can more fully test the latest and greatest version. -- Recently using MacOSX....... From barry at python.org Thu Jan 16 18:48:57 2003 From: barry at python.org (Barry A. Warsaw) Date: Sat Jan 18 19:58:00 2003 Subject: [Spambayes] spambayes fronting a mailing list? References: <200301160612.h0G6C0x14523@localhost.localdomain> <3E26A278.3080302@hooft.net> <15910.43747.285523.378123@montanaro.dyndns.org> Message-ID: <15911.17641.101054.962896@gargle.gargle.HOWL> Rob> Doesn't it take time before the first spam arrives on a brand Rob> new mailinglist? Spambayes' results are going to be real Rob> lousy if it is trained on 200 ham and 0 spam messages.... It might be, but how will that lousiness manifest? As false negatives? If so, the -spam reporting address for the list should eventually warm up the spam side, right? Depending on how much your legitimate list traffic looks like spam already, it might warm up pretty quickly. -Barry From barry at python.org Thu Jan 16 18:51:20 2003 From: barry at python.org (Barry A. Warsaw) Date: Sat Jan 18 19:58:09 2003 Subject: [Spambayes] spambayes fronting a mailing list? References: <200301160612.h0G6C0x14523@localhost.localdomain> <15910.43747.285523.378123@montanaro.dyndns.org> <3E26B4EB.5020100@hooft.net> Message-ID: <15911.17784.585905.637568@gargle.gargle.HOWL> >>>>> "RWWH" == Rob W W Hooft writes: RWWH> This sounds reasonable, but this can also be implemented as RWWH> a "preloaded database" that comes with spambayes. This is RWWH> something many people have already asked for. I thought about this, and from Mailman's perspective it wouldn't be hard to pre-train the list on some known spam when spambayes is enabled. If there is actual list traffic at that point, then perhaps we can assume it's all ham and train on a balanced number of messages. There may need to be hooks to reset, retrain or untrain the system. I think those are all tractable but not something I've addressed in my prototype. -Barry From barry at python.org Thu Jan 16 18:52:52 2003 From: barry at python.org (Barry A. Warsaw) Date: Sat Jan 18 19:58:15 2003 Subject: [Spambayes] spambayes fronting a mailing list? References: <200301160612.h0G6C0x14523@localhost.localdomain> <3E26A278.3080302@hooft.net> <15910.51260.847140.60292@gargle.gargle.HOWL> <3E26DA4A.40404@hooft.net> Message-ID: <15911.17876.845208.125779@gargle.gargle.HOWL> >>>>> "RWWH" == Rob W W Hooft writes: RWWH> Isn't everything going to be marked as unsure as long as RWWH> there is no spam at all? It didn't seem to. But I only barely played with it. -Barry From barry at python.org Thu Jan 16 18:57:02 2003 From: barry at python.org (Barry A. Warsaw) Date: Sat Jan 18 19:58:20 2003 Subject: [Spambayes] spambayes fronting a mailing list? References: <15910.18557.535408.669103@gargle.gargle.HOWL> Message-ID: <15911.18126.450611.521139@gargle.gargle.HOWL> >>>>> "TP" == Tim Peters writes: TP> Better to start by training on a few spam, and a few copies of TP> the list introduction msg (a decent intro msg necessarily TP> contains many words and lexicalisms characteristic of the TP> list's topic). See my previous message about initial training. We may want to have some canned spam to train on when we enable spambayes. Using the list intro message is a neat idea for when you have no posts available for the list. If, OTOH, people take Skips advice and only turn it on when its necessary, then maybe we can use messages we already have to train it. One source of known good messages are those the admin has explicitly approved. Maybe if we have 20 canned spam, we can save up to the last 20 approved messages. Then when the list admin enables spambayes, we train on those. -Barry From barry at python.org Thu Jan 16 18:45:58 2003 From: barry at python.org (Barry A. Warsaw) Date: Sat Jan 18 19:58:35 2003 Subject: [Spambayes] spambayes fronting a mailing list? References: <15909.39504.598866.52741@montanaro.dyndns.org> <15910.18557.535408.669103@gargle.gargle.HOWL> <15910.42543.629381.696105@montanaro.dyndns.org> Message-ID: <15911.17462.734808.296463@gargle.gargle.HOWL> [I added mailman-developers to this list because I think people will be interested in my prototype integration of Mailman and spambayes, a statistical learning classifier, which I've targeted for spam fighting on Mailman lists. -BAW]. >>>>> "SM" == Skip Montanaro writes: SM> In my case I sidestepped training altogether because the SM> list's content is a subset of the stuff I'm interested in SM> anyway. Most of the "spam" messages encountered by the list SM> at this point are really of the virus/worm variety, and since SM> it's set up for members only posting, little, if any garbage SM> actually gets through to the list, even without using SM> spambayes. I suspect python.org will be similar, since we have many other spam defenses in place. I've just been playing with my prototype, and yeah, it sure learns fast even with no a-priori training. I'm not 100% a train-on-the-fly approach will work, so it's worth some real world banging. In my simplified approach, you start out holding all unsure and spam. Legit messages will hit one of those first, likely unsure if your list wasn't advertised on Usenet before real people started posting . There's one extra button on the admindb page called "Train?". Click this if you want to train a held message based on your action. If you approve the message, it gets trained as ham, and if you reject or discard it, it gets trained as spam. Within about 10 messages (first a bunch of ham, then a random and unscientific barrage of spam and ham) the classifier was doing pretty good. It was catching all the spam and letting through most of the ham. The ham recognition definitely went up as I approved more messages. False positives get caught on the admindb screen, so you approve and train them in one action. Although I never saw any false negatives, I think the way to handle these will be to add a -spam address that people can send messages to. If the list admin sends it then it gets spam trained. If not, the list admin will have a chance to decide whether to spam train it or not. SM> One reason I'm interested in separating pop3proxy into two SM> functions ( POP retrieval/classifying and training/web UI) is SM> that the training/web component should be useful for other SM> spambayes users. Right now in my current environment, SM> training is clunky enough that I only train on unsures and SM> mistakes. While that works okay because my starting corpus SM> was so large (around 20,000 messages) the indications from SM> people who've experimented with that sort of training is that SM> the quality of classification does degrade over time. That's an important point. While I'm not sure that with my approach the quality of classification will improve over time , I think a training regimen integrated with the admindb stuff will be the most natural for a Mailman list admin. BTW, the hammie.py interface was all I needed for my prototype. One reason for going with hammie is that each mailing list needs its own database, and I can just create a Hammie, associate it with a list, and tie it easily into Mailman's load/save mechanism. -Barry From tim.one at comcast.net Sun Jan 19 00:27:10 2003 From: tim.one at comcast.net (Tim Peters) Date: Sun Jan 19 00:29:03 2003 Subject: [Spambayes] spambayes fronting a mailing list? In-Reply-To: <15911.17641.101054.962896@gargle.gargle.HOWL> Message-ID: [Rob] > Doesn't it take time before the first spam arrives on a brand > new mailinglist? Spambayes' results are going to be real > lousy if it is trained on 200 ham and 0 spam messages.... [Barry] > It might be, but how will that lousiness manifest? It depends on a lot on whether you enable the bool experimental_ham_spam_imbalance_adjustment option. It it's true, and you have no spam, every msg will score exactly 0.5. > As false negatives? If experimental_ham_spam_imbalance_adjustment is false (still the default, since I haven't touched the code since the option was introduced), yes. Every word in the database will be associated with ham, so nothing is evidence for spam. > If so, the -spam reporting address for the list should > eventually warm up the spam side, right? Yes it will. It's best to shoot for the same # of ham and spam, if for no other reason than that then experimental_ham_spam_imbalance_adjustment has no effect either way <0.6 wink>. > Depending on how much your legitimate list traffic looks like spam > already, it might warm up pretty quickly. It won't look like spam. Even if it "looks like spam" to human eyes, the classifier will find many strong differences, some of which people will never think of. Hell, some differences people will even argue about, but it's futile -- real-life data doesn't lie about real life . From vanhorn at whidbey.com Sat Jan 18 22:37:54 2003 From: vanhorn at whidbey.com (G. Armour Van Horn) Date: Sun Jan 19 01:37:57 2003 Subject: [Spambayes] spambayes fronting a mailing list? References: <15910.18557.535408.669103@gargle.gargle.HOWL> <15911.18126.450611.521139@gargle.gargle.HOWL> Message-ID: <3E2A47C2.B323EF93@whidbey.com> Since I have about eight lists on my Mailman server that relate to real estate, I'd certainly hate to see any mortgage or refininance spam show up in the canned spam seed. I'm sure that there are hosts for medical purposes who wouldn't want to have penis and breast enlargement spam in the seed. The list intro as ham makes sense, as might sending a message (or a series) comprising the current list membership, those e-mail addresses are certainly strong ham clues. Van "Barry A. Warsaw" wrote: > >>>>> "TP" == Tim Peters writes: > > TP> Better to start by training on a few spam, and a few copies of > TP> the list introduction msg (a decent intro msg necessarily > TP> contains many words and lexicalisms characteristic of the > TP> list's topic). > > See my previous message about initial training. We may want to have > some canned spam to train on when we enable spambayes. Using the list > intro message is a neat idea for when you have no posts available for > the list. > > If, OTOH, people take Skips advice and only turn it on when its > necessary, then maybe we can use messages we already have to train > it. One source of known good messages are those the admin has > explicitly approved. Maybe if we have 20 canned spam, we can save up > to the last 20 approved messages. Then when the list admin enables > spambayes, we train on those. > > -Barry > > _______________________________________________ > Spambayes mailing list > Spambayes@python.org > http://mail.python.org/mailman/listinfo/spambayes -- ---------------------------------------------------------- Sign up now for Quotes of the Day, a handful of quotations on a theme delivered every morning. Enlightenment! Daily, for free! mailto:twisted@whidbey.com?subject=Subscribe_QOTD For web hosting and maintenance, visit Van's home page: http://www.domainvanhorn.com/van/ ---------------------------------------------------------- From noreply at sourceforge.net Sat Jan 18 22:51:27 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Sun Jan 19 10:32:06 2003 Subject: [Spambayes] [ spambayes-Feature Requests-670573 ] IMAP proxy Message-ID: Feature Requests item #670573, was opened at 2003-01-19 01:51 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498106&aid=670573&group_id=61702 Category: None Group: None Status: Open Priority: 5 Submitted By: Jean-Marc Valin (jmvalin) Assigned to: Nobody/Anonymous (nobody) Summary: IMAP proxy Initial Comment: I use IMAP for my mail, so I think an IMAP proxy for spambayes would be great. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498106&aid=670573&group_id=61702 From francois.granger at free.fr Sun Jan 19 19:23:42 2003 From: francois.granger at free.fr (=?iso-8859-1?Q?Fran=E7ois?= Granger) Date: Sun Jan 19 13:23:49 2003 Subject: [Spambayes] Success and failure In-Reply-To: References: Message-ID: At 12:08 -0800 18/01/2003, in message Re: [Spambayes] Success and failure, Tony Lownds wrote: > >BTW, I've updated the patch for binding to a specific address and >posted it to Sourceforge: #670417 I had a look to Sourceforge. It is there but I can't download the file. Can you send it to me directly ? -- Recently using MacOSX....... From richard at jowsey.com Mon Jan 20 06:38:26 2003 From: richard at jowsey.com (Richard Jowsey) Date: Sun Jan 19 14:44:22 2003 Subject: [Spambayes] FYI: Java implementation In-Reply-To: References: <3E25521B.20937.3607FFA@localhost> Message-ID: <3E2B9962.26334.308D0BD@localhost> > Upgrade to Python and you would have finished a couple months ago > . Yeah, that thought had occurred to me too... > [chi-combining] This gives it some nice > properties for automated decision making (the cutoff points for > gary-combining were too touchy, across test sets, and across > time). But if you like a mode where you simply sort msgs by > score, you can stop with gary-combining and be happy. I have a very large training corpus, so I'm seeing well- separated distributions of good versus spam probs, with a sprinkling of "unsures" scattered through the middle. An uncertain cutoff at 3 sigma from the means should work, but this notion needs some testing. That chi2 test is definitely on the drawing boards, even if only for comparison purposes... Death To Spam! Cheers, Richard From nas at python.ca Sun Jan 19 16:13:44 2003 From: nas at python.ca (Neil Schemenauer) Date: Sun Jan 19 19:07:43 2003 Subject: [Spambayes] pushing back the cost of spam Message-ID: <20030120001344.GA6862@glacier.arctrix.com> Here's an idea. Do spam filtering at the transport level (i.e. "STMP time"). When a message is considered spam by the filter, return a temporary error (i.e. 4xx). Include the number of times the message delivery has been retried and the time since the first attempt as part of the evidence when filtering. RFC 2821 specifies that messages should be retried for at least 4 days. If the message is still being retried after, say, 2 days and is still flagged as spam by the filter then accept it but save it in the spam folder. I think if this system was widely implemented the spammer's job would become considerably more difficult. Spammers rely on hit and run tactics and I don't think they could tolerate a one or two day delay. Abused open relays would become heavily loaded due to all messages queued and the retries. When the open relay was secured hopefully the queue of spam would be cleared. Also, I believe most email viruses do not retry after a temporary error. Finally, I'm guessing that a retry after one day would end up being a strong ham clue. Legitimate email that was initially considered spam would have a better chance of not ending up in the spam folder. Thoughts? Neil From nas at python.ca Sun Jan 19 16:35:05 2003 From: nas at python.ca (Neil Schemenauer) Date: Sun Jan 19 19:29:01 2003 Subject: [Spambayes] pushing back the cost of spam In-Reply-To: <20030120001344.GA6862@glacier.arctrix.com> References: <20030120001344.GA6862@glacier.arctrix.com> Message-ID: <20030120003505.GB6862@glacier.arctrix.com> I forgot to mention one advantage of this scheme. It could be implemented in modified form for an entire server of users (without their help). Only return a temporary error for something like 12 hours. After that, allow the mail through. That doesn't violate any standards and all legitimate mail will get through. I suspect a lot of spam would be blocked (I'll try to run some tests). Having a system that can be enabled server-wide is a big advantage. Spambayes is great for technical people who don't want to see spam. It's not really helping make spam unprofitable though, as Paul Graham has mentioned in one of his articles. We need to stop spam from reaching those few idiots that actually act upon it. I doubt those people would install a spam filter themselves. Either it has to be part of the MUA or it needs to be installed by someone else. Neil From john.abel at pa.press.net Mon Jan 20 14:02:59 2003 From: john.abel at pa.press.net (John Abel) Date: Mon Jan 20 09:05:09 2003 Subject: [Spambayes] Change Required To pspam/options.py Message-ID: <3E2C0193.3040109@pa.press.net> Hi, I've been playing around with the pspam scripts, and found, since the move-around, that it was broke. The line: from Options import options, all_options, \ boolean_cracker, float_cracker, int_cracker, string_cracker needs changing to from spambayes.Options import options, all_options, \ boolean_cracker, float_cracker, int_cracker, string_cracker I notice that this part of spambayes, seems to be somewhat aimed at *nix distributions. I would be willing to work/test it on Win32? Regards John From skip at pobox.com Mon Jan 20 09:00:41 2003 From: skip at pobox.com (Skip Montanaro) Date: Mon Jan 20 10:00:49 2003 Subject: [Spambayes] locking pickle/dbm against concurrent access? Message-ID: <15916.3865.297629.696625@montanaro.dyndns.org> Depending on how training and classifying are accomplished, it's quite possible that the two activities will be done in different processes. For example, I am currently experimenting with training using pop3proxy (well, still my offshoot proxytrainer at the moment) while classification is being done by hammiefilter run from procmail. This implies a need to lock the shelve/pickle file used to store the training info. Seems to me we need to (be able to) lock the shelve/pickle file. The only lock facility which seems cross-platform enough for this application is the set of flags used by os.open(). To lock the database you'd have to check/create a lock file related (namewise) to the actual database file. Has anyone given this any thought? Skip From noreply at sourceforge.net Mon Jan 20 03:35:25 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Mon Jan 20 10:01:59 2003 Subject: [Spambayes] [ spambayes-Patches-670417 ] Allow the pop3 proxies to bind to specific addresses Message-ID: Patches item #670417, was opened at 2003-01-18 20:06 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498105&aid=670417&group_id=61702 Category: None Group: None Status: Open Resolution: None Priority: 5 Submitted By: Tony Lownds (tonylownds) >Assigned to: Richie Hindle (richiehindle) Summary: Allow the pop3 proxies to bind to specific addresses Initial Comment: This patch allows one to specify an IP address when specifying a port in the pop3proxy_ports setting. This is useful for two reasons: 1. By binding to a loopback address, the pop3proxy cannot be contacted from outside machines. Providing this option improves security. 2. The mail client Eudora - which is quite popular - is unable to specify a different POP port for different POP accounts. This patch alllows Eudora to be used with spambayes with multiple POP accounts. The implementation is fairly straightforward: any place a port was passed for binding, a pair of (address, port) is passed. In the two places a port was read (from a configuration file and from command line options), either an int or an address:int is accepted. Any place a port was turned into a string for printing, the (address, port) pair is turned into a suitable string. ---------------------------------------------------------------------- >Comment By: Richie Hindle (richiehindle) Date: 2003-01-20 11:35 Message: Logged In: YES user_id=85414 Has SourceForge eaten the patch file? It says "No Files Currently Attached". ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498105&aid=670417&group_id=61702 From francois.granger at free.fr Mon Jan 20 14:13:05 2003 From: francois.granger at free.fr (Fran=?ISO-8859-1?B?5w==?=ois Granger) Date: Mon Jan 20 10:02:19 2003 Subject: [Spambayes] [ spambayes-Patches-670417 ] Message-ID: I got the files from Tony by private mail yesterday night. -- Le courrier est un moyen de communication. Les gens devraient se poser des questions sur les implications politiques des choix (ou non choix) de leurs outils et technologies. Pour des courriers propres : -- -------------- next part -------------- #!/usr/bin/env python """A POP3 proxy that works with classifier.py, and adds a simple X-Spambayes-Classification header (ham/spam/unsure) to each incoming email. You point pop3proxy at your POP3 server, and configure your email client to collect mail from the proxy then filter on the added header. Usage: pop3proxy.py [options] [ []] is the name of your real POP3 server is the port number of your real POP3 server, which defaults to 110. options: -z : Runs a self-test and exits. -t : Runs a fake POP3 server on port 8110 (for testing). -h : Displays this help message. -p FILE : use the named database file -d : the database is a DBM file rather than a pickle -l port : proxy listens on this port number (default 110) -u port : User interface listens on this port number (default 8880; Browse http://localhost:8880/) -b : Launch a web browser showing the user interface. All command line arguments and switches take their default values from the [pop3proxy] and [html_ui] sections of bayescustomize.ini. For safety, and to help debugging, the whole POP3 conversation is written out to _pop3proxy.log for each run, if options.verbose is True. To make rebuilding the database easier, uploaded messages are appended to _pop3proxyham.mbox and _pop3proxyspam.mbox. """ # This module is part of the spambayes project, which is Copyright 2002 # The Python Software Foundation and is covered by the Python Software # Foundation license. __author__ = "Richie Hindle " __credits__ = "Tim Peters, Neale Pickett, Tim Stone, all the Spambayes folk." try: True, False except NameError: # Maintain compatibility with Python 2.2 True, False = 1, 0 todo = """ Web training interface: o Functional tests. o Review already-trained messages, and purge them. o Put in a link to view a message (plain text, html, multipart...?) Include a Reply link that launches the registered email client, eg. mailto:tim@fourstonesExpressions.com?subject=Re:%20pop3proxy&body=Hi%21%0D o Keyboard navigation (David Ascher). But aren't Tab and left/right arrow enough? o [Francois Granger] Show the raw spambrob number close to the buttons (this would mean using the extra X-Hammie header by default). o Add Today and Refresh buttons on the Review page. User interface improvements: o Once the pieces are on separate pages, make the paste box bigger. o Deployment: Windows executable? atlaxwin and ctypes? Or just webbrowser? o Can it cleanly dynamically update its status display while having a POP3 converation? Hammering reload sucks. o Save the stats (num classified, etc.) between sessions. o "Reload database" button. New features: o "Send me an email every [...] to remind me to train on new messages." o "Send me a status email every [...] telling how many mails have been classified, etc." o Possibly integrate Tim Stone's SMTP code - make it use async, make the training code update (rather than replace!) the database. o Allow use of the UI without the POP3 proxy. o Remove any existing X-Spambayes-Classification header from incoming emails. o Whitelist. o Online manual. o Links to project homepage, mailing list, etc. o List of words with stats (it would have to be paged!) a la SpamSieve. Code quality: o Make a separate Dibbler plugin for serving images, so there's no duplication between pop3proxy and OptionConfig. o Move the UI into its own module. o Cope with the email client timing out and closing the connection. o Lose the trailing dot from cached messages. Info: o Slightly-wordy index page; intro paragraph for each page. o In both stats and training results, report nham and nspam - warn if they're very different (for some value of 'very'). o "Links" section (on homepage?) to project homepage, mailing list, etc. Gimmicks: o Classify a web page given a URL. o Graphs. Of something. Who cares what? o NNTP proxy. o Zoe...! Notes, for the sake of somewhere better to put them: Don't proxy spams at all? This would mean writing a full POP3 client and server - it would download all your mail on a timer and serve to you all the non-spams. It could be 'safe' in that it leaves the messages in the real POP3 account until you collect them from it (or in the case of spams, until you collect contemporaneous hams). The web interface would then present all the spams so that you could correct any FPs and mark them for collection. The thing is no longer a proxy (because the first POP3 command in a conversion is STAT or LIST, which tells you how many mails there are - it wouldn't know the answer, and finding out could take weeks over a modem - I've already had problems with clients timing out while the proxy was downloading stuff from the server). Adam's idea: add checkboxes to a Google results list for "Relevant" / "Irrelevant", then submit that to build a search including the highest-scoring tokens and excluding the lowest-scoring ones. """ try: import cStringIO as StringIO except ImportError: import StringIO import os, sys, re, operator, errno, getopt, string, time, bisect import socket, asyncore, asynchat, cgi, urlparse, webbrowser import mailbox, email.Header import spambayes from spambayes import storage, tokenizer, mboxutils, PyMeldLite, Dibbler from spambayes.FileCorpus import FileCorpus, ExpiryFileCorpus from spambayes.FileCorpus import FileMessageFactory, GzipFileMessageFactory from email.Iterators import typed_subpart_iterator from OptionConfig import OptionsConfigurator from spambayes.Options import options # HEADER_EXAMPLE is the longest possible header - the length of this one # is added to the size of each message. HEADER_FORMAT = '%s: %%s\r\n' % options.hammie_header_name HEADER_EXAMPLE = '%s: xxxxxxxxxxxxxxxxxxxx\r\n' % options.hammie_header_name IMAGES = ('helmet', 'status', 'config', 'message', 'train', 'classify', 'query') class ServerLineReader(Dibbler.BrighterAsyncChat): """An async socket that reads lines from a remote server and simply calls a callback with the data. The BayesProxy object can't connect to the real POP3 server and talk to it synchronously, because that would block the process.""" def __init__(self, serverName, serverPort, lineCallback): Dibbler.BrighterAsyncChat.__init__(self) self.lineCallback = lineCallback self.request = '' self.set_terminator('\r\n') self.create_socket(socket.AF_INET, socket.SOCK_STREAM) try: self.connect((serverName, serverPort)) except socket.error, e: error = "Can't connect to %s:%d: %s" % (serverName, serverPort, e) print >>sys.stderr, error self.lineCallback('-ERR %s\r\n' % error) self.lineCallback('') # "The socket's been closed." self.close() def collect_incoming_data(self, data): self.request = self.request + data def found_terminator(self): self.lineCallback(self.request + '\r\n') self.request = '' def handle_close(self): self.lineCallback('') self.close() class POP3ProxyBase(Dibbler.BrighterAsyncChat): """An async dispatcher that understands POP3 and proxies to a POP3 server, calling `self.onTransaction(request, response)` for each transaction. Responses are not un-byte-stuffed before reaching self.onTransaction() (they probably should be for a totally generic POP3ProxyBase class, but BayesProxy doesn't need it and it would mean re-stuffing them afterwards). self.onTransaction() should return the response to pass back to the email client - the response can be the verbatim response or a processed version of it. The special command 'KILL' kills it (passing a 'QUIT' command to the server). """ def __init__(self, clientSocket, serverName, serverPort): Dibbler.BrighterAsyncChat.__init__(self, clientSocket) self.request = '' self.response = '' self.set_terminator('\r\n') self.command = '' # The POP3 command being processed... self.args = '' # ...and its arguments self.isClosing = False # Has the server closed the socket? self.seenAllHeaders = False # For the current RETR or TOP self.startTime = 0 # (ditto) self.serverSocket = ServerLineReader(serverName, serverPort, self.onServerLine) def onTransaction(self, command, args, response): """Overide this. Takes the raw request and the response, and returns the (possibly processed) response to pass back to the email client. """ raise NotImplementedError def onServerLine(self, line): """A line of response has been received from the POP3 server.""" isFirstLine = not self.response self.response = self.response + line # Is this the line that terminates a set of headers? self.seenAllHeaders = self.seenAllHeaders or line in ['\r\n', '\n'] # Has the server closed its end of the socket? if not line: self.isClosing = True # If we're not processing a command, just echo the response. if not self.command: self.push(self.response) self.response = '' # Time out after 30 seconds for message-retrieval commands if # all the headers are down. The rest of the message will proxy # straight through. if self.command in ['TOP', 'RETR'] and \ self.seenAllHeaders and time.time() > self.startTime + 30: self.onResponse() self.response = '' # If that's a complete response, handle it. elif not self.isMultiline() or line == '.\r\n' or \ (isFirstLine and line.startswith('-ERR')): self.onResponse() self.response = '' def isMultiline(self): """Returns True if the request should get a multiline response (assuming the response is positive). """ if self.command in ['USER', 'PASS', 'APOP', 'QUIT', 'STAT', 'DELE', 'NOOP', 'RSET', 'KILL']: return False elif self.command in ['RETR', 'TOP']: return True elif self.command in ['LIST', 'UIDL']: return len(self.args) == 0 else: # Assume that an unknown command will get a single-line # response. This should work for errors and for POP-AUTH, # and is harmless even for multiline responses - the first # line will be passed to onTransaction and ignored, then the # rest will be proxied straight through. return False ## This is an attempt to solve the problem whereby the email client ## times out and closes the connection but the ServerLineReader is still ## connected, so you get errors from the POP3 server next time because ## there's already an active connection. But after introducing this, ## I kept getting unexplained "Bad file descriptor" errors in recv. ## ## def handle_close(self): ## """If the email client closes the connection unexpectedly, eg. ## because of a timeout, close the server connection.""" ## self.serverSocket.shutdown(2) ## self.serverSocket.close() ## self.close() def collect_incoming_data(self, data): """Asynchat override.""" self.request = self.request + data def found_terminator(self): """Asynchat override.""" verb = self.request.strip().upper() if verb == 'KILL': self.socket.shutdown(2) self.close() raise SystemExit elif verb == 'CRASH': # For testing x = 0 y = 1/x self.serverSocket.push(self.request + '\r\n') if self.request.strip() == '': # Someone just hit the Enter key. self.command = self.args = '' else: # A proper command. splitCommand = self.request.strip().split(None, 1) self.command = splitCommand[0].upper() self.args = splitCommand[1:] self.startTime = time.time() self.request = '' def onResponse(self): # Pass the request and the raw response to the subclass and # send back the cooked response. if self.response: cooked = self.onTransaction(self.command, self.args, self.response) self.push(cooked) # If onServerLine() decided that the server has closed its # socket, close this one when the response has been sent. if self.isClosing: self.close_when_done() # Reset. self.command = '' self.args = '' self.isClosing = False self.seenAllHeaders = False class BayesProxyListener(Dibbler.Listener): """Listens for incoming email client connections and spins off BayesProxy objects to serve them. """ def __init__(self, serverName, serverPort, proxyPort): proxyArgs = (serverName, serverPort) Dibbler.Listener.__init__(self, proxyPort, BayesProxy, proxyArgs) print 'Listener on port %s is proxying %s:%d' % \ (_addressPortStr(proxyPort), serverName, serverPort) class BayesProxy(POP3ProxyBase): """Proxies between an email client and a POP3 server, inserting judgement headers. It acts on the following POP3 commands: o STAT: o Adds the size of all the judgement headers to the maildrop size. o LIST: o With no message number: adds the size of an judgement header to the message size for each message in the scan listing. o With a message number: adds the size of an judgement header to the message size. o RETR: o Adds the judgement header based on the raw headers and body of the message. o TOP: o Adds the judgement header based on the raw headers and as much of the body as the TOP command retrieves. This can mean that the header might have a different value for different calls to TOP, or for calls to TOP vs. calls to RETR. I'm assuming that the email client will either not make multiple calls, or will cope with the headers being different. """ def __init__(self, clientSocket, serverName, serverPort): POP3ProxyBase.__init__(self, clientSocket, serverName, serverPort) self.handlers = {'STAT': self.onStat, 'LIST': self.onList, 'RETR': self.onRetr, 'TOP': self.onTop} state.totalSessions += 1 state.activeSessions += 1 self.isClosed = False def send(self, data): """Logs the data to the log file.""" if options.verbose: state.logFile.write(data) state.logFile.flush() try: return POP3ProxyBase.send(self, data) except socket.error: # The email client has closed the connection - 40tude Dialog # does this immediately after issuing a QUIT command, # without waiting for the response. self.close() def recv(self, size): """Logs the data to the log file.""" data = POP3ProxyBase.recv(self, size) if options.verbose: state.logFile.write(data) state.logFile.flush() return data def close(self): # This can be called multiple times by async. if not self.isClosed: self.isClosed = True state.activeSessions -= 1 POP3ProxyBase.close(self) def onTransaction(self, command, args, response): """Takes the raw request and response, and returns the (possibly processed) response to pass back to the email client. """ handler = self.handlers.get(command, self.onUnknown) return handler(command, args, response) def onStat(self, command, args, response): """Adds the size of all the judgement headers to the maildrop size.""" match = re.search(r'^\+OK\s+(\d+)\s+(\d+)(.*)\r\n', response) if match: count = int(match.group(1)) size = int(match.group(2)) + len(HEADER_EXAMPLE) * count return '+OK %d %d%s\r\n' % (count, size, match.group(3)) else: return response def onList(self, command, args, response): """Adds the size of an judgement header to the message size(s).""" if response.count('\r\n') > 1: # Multiline: all lines but the first contain a message size. lines = response.split('\r\n') outputLines = [lines[0]] for line in lines[1:]: match = re.search('^(\d+)\s+(\d+)', line) if match: number = int(match.group(1)) size = int(match.group(2)) + len(HEADER_EXAMPLE) line = "%d %d" % (number, size) outputLines.append(line) return '\r\n'.join(outputLines) else: # Single line. match = re.search('^\+OK\s+(\d+)(.*)\r\n', response) if match: size = int(match.group(1)) + len(HEADER_EXAMPLE) return "+OK %d%s\r\n" % (size, match.group(2)) else: return response def onRetr(self, command, args, response): """Adds the judgement header based on the raw headers and body of the message.""" # Use '\n\r?\n' to detect the end of the headers in case of # broken emails that don't use the proper line separators. if re.search(r'\n\r?\n', response): # Break off the first line, which will be '+OK'. ok, messageText = response.split('\n', 1) # Now find the spam disposition and add the header. prob = state.bayes.spamprob(tokenizer.tokenize(messageText)) if prob < options.ham_cutoff: disposition = options.header_ham_string if command == 'RETR': state.numHams += 1 elif prob > options.spam_cutoff: disposition = options.header_spam_string if command == 'RETR': state.numSpams += 1 else: disposition = options.header_unsure_string if command == 'RETR': state.numUnsure += 1 header = '%s: %s\r\n' % (options.hammie_header_name, disposition) headers, body = re.split(r'\n\r?\n', messageText, 1) headers = headers + "\n" + header + "\r\n" messageText = headers + body # Cache the message; don't pollute the cache with test messages. if command == 'RETR' and not state.isTest: # The message name is the time it arrived, with a uniquifier # appended if two arrive within one clock tick of each other. messageName = "%10.10d" % long(time.time()) if messageName == state.lastBaseMessageName: state.lastBaseMessageName = messageName messageName = "%s-%d" % (messageName, state.uniquifier) state.uniquifier += 1 else: state.lastBaseMessageName = messageName state.uniquifier = 2 # Write the message into the Unknown cache. message = state.unknownCorpus.makeMessage(messageName) message.setSubstance(messageText) state.unknownCorpus.addMessage(message) # Return the +OK and the message with the header added. return ok + "\n" + messageText else: # Must be an error response. return response def onTop(self, command, args, response): """Adds the judgement header based on the raw headers and as much of the body as the TOP command retrieves.""" # Easy (but see the caveat in BayesProxy.__doc__). return self.onRetr(command, args, response) def onUnknown(self, command, args, response): """Default handler; returns the server's response verbatim.""" return response class UserInterfaceServer(Dibbler.HTTPServer): """Implements the web server component via a Dibbler plugin.""" def __init__(self, uiPort): Dibbler.HTTPServer.__init__(self, uiPort) print 'User interface url is http://localhost:%d/' % (uiPort) def readUIResources(): """Returns ui.html and a dictionary of Gifs. Used here and by OptionConfig""" # Using `exec` is nasty, but I couldn't figure out a way of making # `getattr` or `__import__` work with ResourcePackage. from spambayes.resources import ui_html images = {} for imageName in IMAGES: exec "from spambayes.resources import %s_gif" % imageName exec "images[imageName] = %s_gif.data" % imageName return ui_html.data, images class UserInterface(Dibbler.HTTPPlugin): """Serves the HTML user interface of the proxy.""" def __init__(self): """Load up the necessary resources: ui.html and helmet.gif.""" Dibbler.HTTPPlugin.__init__(self) htmlSource, self._images = readUIResources() self.html = PyMeldLite.Meld(htmlSource, readonly=True) def onIncomingConnection(self, clientSocket): """Checks the security settings.""" return options.html_ui_allow_remote_connections or \ clientSocket.getpeername()[0] == clientSocket.getsockname()[0] def _writePreamble(self, name, showImage=True): """Writes the HTML for the beginning of a page - time-consuming methlets use this and `_writePostamble` to write the page in pieces, including progress messages.""" # Take the whole palette and remove the content and the footer, # leaving the header and an empty body. html = self.html.clone() html.mainContent = " " del html.footer # Add in the name of the page and remove the link to Home if this # *is* Home. html.title = name if name == 'Home': del html.homelink html.pagename = "Home" else: html.pagename = "> " + name # Remove the helmet image if we're not showing it - this happens on # shutdown because the browser might ask for the image after we've # exited. if not showImage: del html.helmet # Strip the closing tags, so we push as far as the start of the main # content. We'll push the closing tags at the end. self.writeOKHeaders('text/html') self.write(re.sub(r'\s*\s*', '', str(html))) def _writePostamble(self): """Writes the end of time-consuming pages - see `_writePreamble`.""" footer = self.html.footer.clone() footer.timestamp = time.asctime(time.localtime()) self.write("" + self.html.footer) self.write("") def _trimHeader(self, field, limit, quote=False): """Trims a string, adding an ellipsis if necessary and HTML-quoting on request. Also pumps it through email.Header.decode_header, which understands charset sections in email headers - I suspect this will only work for Latin character sets, but hey, it works for Francois Granger's name. 8-)""" sections = email.Header.decode_header(field) field = ' '.join([text for text, unused in sections]) if len(field) > limit: field = field[:limit-3] + "..." if quote: field = cgi.escape(field) return field def onHome(self): """Serve up the homepage.""" stateDict = state.__dict__.copy() stateDict.update(state.bayes.__dict__) statusTable = self.html.statusTable.clone() if not state.servers: statusTable.proxyDetails = "No POP3 proxies running." content = (self._buildBox('Status and Configuration', 'status.gif', statusTable % stateDict)+ self._buildBox('Train on proxied messages', 'train.gif', self.html.reviewText) + self._buildTrainBox() + self._buildClassifyBox() + self._buildBox('Word query', 'query.gif', self.html.wordQuery)) self._writePreamble("Home") self.write(content) self._writePostamble() def _doSave(self): """Saves the database.""" self.write("Saving... ") self.flush() state.bayes.store() self.write("Done.\n") def onSave(self, how): """Command handler for "Save" and "Save & shutdown".""" isShutdown = how.lower().find('shutdown') >= 0 self._writePreamble("Save", showImage=(not isShutdown)) self._doSave() if isShutdown: self.write("

%s

" % self.html.shutdownMessage) self.write("") self.flush() ## Is this still required?: self.shutdown(2) self.close() raise SystemExit self._writePostamble() def onTrain(self, file, text, which): """Train on an uploaded or pasted message.""" self._writePreamble("Train") # Upload or paste? Spam or ham? content = file or text isSpam = (which == 'Train as Spam') # Convert platform-specific line endings into unix-style. content = content.replace('\r\n', '\n').replace('\r', '\n') # Single message or mbox? if content.startswith('From '): # Get a list of raw messages from the mbox content. class SimpleMessage: def __init__(self, fp): self.guts = fp.read() contentFile = StringIO.StringIO(content) mbox = mailbox.PortableUnixMailbox(contentFile, SimpleMessage) messages = map(lambda m: m.guts, mbox) else: # Just the one message. messages = [content] # Append the message(s) to a file, to make it easier to rebuild # the database later. This is a temporary implementation - # it should keep a Corpus of trained messages. if isSpam: f = open("_pop3proxyspam.mbox", "a") else: f = open("_pop3proxyham.mbox", "a") # Train on the uploaded message(s). self.write("Training...\n") self.flush() for message in messages: tokens = tokenizer.tokenize(message) state.bayes.learn(tokens, isSpam) f.write("From pop3proxy@spambayes.org Sat Jan 31 00:00:00 2000\n") f.write(message) f.write("\n\n") # Save the database and return a link Home and another training form. f.close() self._doSave() self.write("

OK. Return Home or train again:

") self.write(self._buildTrainBox()) self._writePostamble() def _keyToTimestamp(self, key): """Given a message key (as seen in a Corpus), returns the timestamp for that message. This is the time that the message was received, not the Date header.""" return long(key[:10]) def _getTimeRange(self, timestamp): """Given a unix timestamp, returns a 3-tuple: the start timestamp of the given day, the end timestamp of the given day, and the formatted date of the given day.""" # This probably works on Summertime-shift days; time will tell. 8-) this = time.localtime(timestamp) start = (this[0], this[1], this[2], 0, 0, 0, this[6], this[7], this[8]) end = time.localtime(time.mktime(start) + 36*60*60) end = (end[0], end[1], end[2], 0, 0, 0, end[6], end[7], end[8]) date = time.strftime("%A, %B %d, %Y", start) return time.mktime(start), time.mktime(end), date def _buildReviewKeys(self, timestamp): """Builds an ordered list of untrained message keys, ready for output in the Review list. Returns a 5-tuple: the keys, the formatted date for the list (eg. "Friday, November 15, 2002"), the start of the prior page or zero if there isn't one, likewise the start of the given page, and likewise the start of the next page.""" # Fetch all the message keys and sort them into timestamp order. allKeys = state.unknownCorpus.keys() allKeys.sort() # The default start timestamp is derived from the most recent message, # or the system time if there are no messages (not that it gets used). if not timestamp: if allKeys: timestamp = self._keyToTimestamp(allKeys[-1]) else: timestamp = time.time() start, end, date = self._getTimeRange(timestamp) # Find the subset of the keys within this range. startKeyIndex = bisect.bisect(allKeys, "%d" % long(start)) endKeyIndex = bisect.bisect(allKeys, "%d" % long(end)) keys = allKeys[startKeyIndex:endKeyIndex] keys.reverse() # What timestamps to use for the prior and next days? If there any # messages before/after this day's range, use the timestamps of those # messages - this will skip empty days. prior = end = 0 if startKeyIndex != 0: prior = self._keyToTimestamp(allKeys[startKeyIndex-1]) if endKeyIndex != len(allKeys): end = self._keyToTimestamp(allKeys[endKeyIndex]) # Return the keys and their date. return keys, date, prior, start, end def _makeMessageInfo(self, message): """Given an email.Message, return an object with subjectHeader, fromHeader and bodySummary attributes. These objects are passed into appendMessages by onReview - passing email.Message objects directly uses too much memory.""" subjectHeader = message["Subject"] or "(none)" fromHeader = message["From"] or "(none)" try: part = typed_subpart_iterator(message, 'text', 'plain').next() text = part.get_payload() except StopIteration: try: part = typed_subpart_iterator(message, 'text', 'html').next() text = part.get_payload() text, unused = tokenizer.crack_html_style(text) text, unused = tokenizer.crack_html_comment(text) text = tokenizer.html_re.sub(' ', text) text = '(this message only has an HTML body)\n' + text except StopIteration: text = '(this message has no text body)' text = text.replace(' ', ' ') # Else they'll be quoted text = re.sub(r'(\s)\s+', r'\1', text) # Eg. multiple blank lines text = text.strip() class _MessageInfo: pass messageInfo = _MessageInfo() messageInfo.subjectHeader = self._trimHeader(subjectHeader, 50, True) messageInfo.fromHeader = self._trimHeader(fromHeader, 40, True) messageInfo.bodySummary = self._trimHeader(text, 200) return messageInfo def _appendMessages(self, table, keyedMessageInfo, label): """Appends the rows of a table of messages to 'table'.""" stripe = 0 for key, messageInfo in keyedMessageInfo: row = self.html.reviewRow.clone() if label == 'Spam': row.spam.checked = 1 elif label == 'Ham': row.ham.checked = 1 else: row.defer.checked = 1 row.subject = messageInfo.subjectHeader row.subject.title = messageInfo.bodySummary row.from_ = messageInfo.fromHeader setattr(row, 'class', ['stripe_on', 'stripe_off'][stripe]) # Grr! row = str(row).replace('TYPE', label).replace('KEY', key) table += row stripe = stripe ^ 1 def onReview(self, **params): """Present a list of message for (re)training.""" # Train/discard sumbitted messages. self._writePreamble("Review") id = '' numTrained = 0 numDeferred = 0 for key, value in params.items(): if key.startswith('classify:'): id = key.split(':')[2] if value == 'spam': targetCorpus = state.spamCorpus elif value == 'ham': targetCorpus = state.hamCorpus elif value == 'discard': targetCorpus = None try: state.unknownCorpus.removeMessage(state.unknownCorpus[id]) except KeyError: pass # Must be a reload. else: # defer targetCorpus = None numDeferred += 1 if targetCorpus: try: targetCorpus.takeMessage(id, state.unknownCorpus) if numTrained == 0: self.write("

Training... ") self.flush() numTrained += 1 except KeyError: pass # Must be a reload. # Report on any training, and save the database if there was any. if numTrained > 0: plural = '' if numTrained != 1: plural = 's' self.write("Trained on %d message%s. " % (numTrained, plural)) self._doSave() self.write("
 ") # If any messages were deferred, show the same page again. if numDeferred > 0: start = self._keyToTimestamp(id) # Else after submitting a whole page, display the prior page or the # next one. Derive the day of the submitted page from the ID of the # last processed message. elif id: start = self._keyToTimestamp(id) unused, unused, prior, unused, next = self._buildReviewKeys(start) if prior: start = prior else: start = next # Else if they've hit Previous or Next, display that page. elif params.get('go') == 'Next day': start = self._keyToTimestamp(params['next']) elif params.get('go') == 'Previous day': start = self._keyToTimestamp(params['prior']) # Else show the most recent day's page, as decided by _buildReviewKeys. else: start = 0 # Build the lists of messages: spams, hams and unsure. keys, date, prior, this, next = self._buildReviewKeys(start) keyedMessageInfo = {options.header_spam_string: [], options.header_ham_string: [], options.header_unsure_string: []} for key in keys: # Parse the message, get the judgement header and build a message # info object for each message. cachedMessage = state.unknownCorpus[key] message = mboxutils.get_message(cachedMessage.getSubstance()) judgement = message[options.hammie_header_name] or \ options.header_unsure_string messageInfo = self._makeMessageInfo(message) keyedMessageInfo[judgement].append((key, messageInfo)) # Present the list of messages in their groups in reverse order of # appearance. if keys: page = self.html.reviewtable.clone() if prior: page.prior.value = prior del page.priorButton.disabled if next: page.next.value = next del page.nextButton.disabled templateRow = page.reviewRow.clone() page.table = "" # To make way for the real rows. for header, label in ((options.header_spam_string, 'Spam'), (options.header_ham_string, 'Ham'), (options.header_unsure_string, 'Unsure')): messages = keyedMessageInfo[header] if messages: subHeader = str(self.html.reviewSubHeader) subHeader = subHeader.replace('TYPE', label) page.table += self.html.blankRow page.table += subHeader self._appendMessages(page.table, messages, label) page.table += self.html.trainRow title = "Untrained messages received on %s" % date box = self._buildBox(title, None, page) # No icon, to save space. else: page = "

There are no untrained messages to display. " page += "Return Home.

" title = "No untrained messages" box = self._buildBox(title, 'status.gif', page) self.write(box) self._writePostamble() def onClassify(self, file, text, which): """Classify an uploaded or pasted message.""" message = file or text message = message.replace('\r\n', '\n').replace('\r', '\n') # For Macs tokens = tokenizer.tokenize(message) probability, clues = state.bayes.spamprob(tokens, evidence=True) cluesTable = self.html.cluesTable.clone() cluesRow = cluesTable.cluesRow.clone() del cluesTable.cluesRow # Delete dummy row to make way for real ones for word, wordProb in clues: cluesTable += cluesRow % (word, wordProb) results = self.html.classifyResults.clone() results.probability = probability results.cluesBox = self._buildBox("Clues:", 'status.gif', cluesTable) results.classifyAnother = self._buildClassifyBox() self._writePreamble("Classify") self.write(results) self._writePostamble() def onWordquery(self, word): word = word.lower() wordinfo = state.bayes._wordinfoget(word) if wordinfo: stats = self.html.wordStats.clone() stats.spamcount = wordinfo.spamcount stats.hamcount = wordinfo.hamcount stats.spamprob = state.bayes.probability(wordinfo) else: stats = "%r does not exist in the database." % word query = self.html.wordQuery.clone() query.word.value = word statsBox = self._buildBox("Statistics for %r" % word, 'status.gif', stats) queryBox = self._buildBox("Word query", 'query.gif', query) self._writePreamble("Word query") self.write(statsBox + queryBox) self._writePostamble() def _writeImage(self, image): self.writeOKHeaders('image/gif') self.write(self._images[image]) # If you are easily offended, look away now... for imageName in IMAGES: exec "def %s(self): self._writeImage('%s')" % \ ("on%sGif" % imageName.capitalize(), imageName) def _buildBox(self, heading, icon, content): """Builds a yellow-headed HTML box.""" box = self.html.headedBox.clone() box.heading = heading if icon: box.icon.src = icon else: del box.iconCell box.boxContent = content return box def _buildClassifyBox(self): """Returns a "Classify a message" box. This is used on both the Home page and the classify results page. The Classify form is based on the Upload form.""" form = self.html.upload.clone() del form.or_mbox del form.submit_spam del form.submit_ham form.action = "classify" return self._buildBox("Classify a message", 'classify.gif', form) def _buildTrainBox(self): """Returns a "Train on a given message" box. This is used on both the Home page and the training results page. The Train form is based on the Upload form.""" form = self.html.upload.clone() del form.submit_classify return self._buildBox("Train on a given message", 'message.gif', form) def reReadOptions(self): """Called by the config page when the user saves some new options, or restores the defaults.""" # Reload the options. global state state.bayes.store() reload(spambayes.Options) global options from spambayes.Options import options # Recreate the state. state = State() state.buildServerStrings() state.createWorkers() # Close the exsiting listeners and create new ones. This won't # affect any running proxies - once a listener has created a proxy, # that proxy is then independent of it. for proxy in proxyListeners: proxy.close() del proxyListeners[:] _createProxies(state.servers, state.proxyPorts) # This keeps the global state of the module - the command-line options, # statistics like how many mails have been classified, the handle of the # log file, the Classifier and FileCorpus objects, and so on. class State: def __init__(self): """Initialises the State object that holds the state of the app. The default settings are read from Options.py and bayescustomize.ini and are then overridden by the command-line processing code in the __main__ code below.""" # Open the log file. if options.verbose: self.logFile = open('_pop3proxy.log', 'wb', 0) # Load up the old proxy settings from Options.py / bayescustomize.ini # and give warnings if they're present. XXX Remove these soon. self.servers = [] self.proxyPorts = [] if options.pop3proxy_port != 110 or \ options.pop3proxy_server_name != '' or \ options.pop3proxy_server_port != 110: print "\n pop3proxy_port, pop3proxy_server_name and" print " pop3proxy_server_port are deprecated! Please use" print " pop3proxy_servers and pop3proxy_ports instead.\n" self.servers = [(options.pop3proxy_server_name, options.pop3proxy_server_port)] self.proxyPorts = [options.pop3proxy_port] # Load the new proxy settings - these will override the old ones # if both are present. if options.pop3proxy_servers: for server in options.pop3proxy_servers.split(','): server = server.strip() if server.find(':') > -1: server, port = server.split(':', 1) else: port = '110' self.servers.append((server, int(port))) if options.pop3proxy_ports: splitPorts = options.pop3proxy_ports.split(',') self.proxyPorts = map(_addressAndPort, splitPorts) if len(self.servers) != len(self.proxyPorts): print "pop3proxy_servers & pop3proxy_ports are different lengths!" sys.exit() # Load up the other settings from Option.py / bayescustomize.ini self.useDB = options.pop3proxy_persistent_use_database self.uiPort = options.html_ui_port self.launchUI = options.html_ui_launch_browser self.gzipCache = options.pop3proxy_cache_use_gzip self.cacheExpiryDays = options.pop3proxy_cache_expiry_days self.runTestServer = False self.isTest = False # Set up the statistics. self.totalSessions = 0 self.activeSessions = 0 self.numSpams = 0 self.numHams = 0 self.numUnsure = 0 # Unique names for cached messages - see BayesProxy.onRetr self.lastBaseMessageName = '' self.uniquifier = 2 def buildServerStrings(self): """After the server details have been set up, this creates string versions of the details, for display in the Status panel.""" serverStrings = ["%s:%s" % (s, p) for s, p in self.servers] self.serversString = ', '.join(serverStrings) self.proxyPortsString = ', '.join(map(_addressPortStr, self.proxyPorts)) def createWorkers(self): """Using the options that were initialised in __init__ and then possibly overridden by the driver code, create the Bayes object, the Corpuses, the Trainers and so on.""" print "Loading database...", if self.isTest: self.useDB = True options.pop3proxy_persistent_storage_file = \ '_pop3proxy_test.pickle' # This is never saved. if self.useDB: self.bayes = storage.DBDictClassifier( \ options.pop3proxy_persistent_storage_file) else: self.bayes = storage.PickledClassifier(\ options.pop3proxy_persistent_storage_file) print "Done." # Don't set up the caches and training objects when running the self-test, # so as not to clutter the filesystem. if not self.isTest: def ensureDir(dirname): try: os.mkdir(dirname) except OSError, e: if e.errno != errno.EEXIST: raise # Create/open the Corpuses. Use small cache sizes to avoid hogging # lots of memory. map(ensureDir, [options.pop3proxy_spam_cache, options.pop3proxy_ham_cache, options.pop3proxy_unknown_cache]) if self.gzipCache: factory = GzipFileMessageFactory() else: factory = FileMessageFactory() age = options.pop3proxy_cache_expiry_days*24*60*60 self.spamCorpus = ExpiryFileCorpus(age, factory, options.pop3proxy_spam_cache, '[0-9]*', cacheSize=20) self.hamCorpus = ExpiryFileCorpus(age, factory, options.pop3proxy_ham_cache, '[0-9]*', cacheSize=20) self.unknownCorpus = FileCorpus(factory, options.pop3proxy_unknown_cache, cacheSize=20) # Expire old messages from the trained corpuses. self.spamCorpus.removeExpiredMessages() self.hamCorpus.removeExpiredMessages() # Create the Trainers. self.spamTrainer = storage.SpamTrainer(self.bayes) self.hamTrainer = storage.HamTrainer(self.bayes) self.spamCorpus.addObserver(self.spamTrainer) self.hamCorpus.addObserver(self.hamTrainer) # option-parsing helper functions def _addressAndPort(s): "Decode a string representing a port to bind to, with optional address" if ':' in s: addr, port = s.strip().split(':') return addr, int(port) else: return '', int(s) def _addressPortStr((addr, port)): "Encode a string representing a port to bind to, with optional address" if not addr: return str(port) else: return '%s:%d' % (addr, port) state = State() proxyListeners = [] def _createProxies(servers, proxyPorts): """Create BayesProxyListeners for all the given servers.""" for (server, serverPort), proxyPort in zip(servers, proxyPorts): listener = BayesProxyListener(server, serverPort, proxyPort) proxyListeners.append(listener) def main(servers, proxyPorts, uiPort, launchUI): """Runs the proxy forever or until a 'KILL' command is received or someone hits Ctrl+Break.""" _createProxies(servers, proxyPorts) httpServer = UserInterfaceServer(uiPort) proxyUI = UserInterface() httpServer.register(proxyUI, OptionsConfigurator(proxyUI)) Dibbler.run(launchBrowser=launchUI) # =================================================================== # Test code. # =================================================================== # One example of spam and one of ham - both are used to train, and are # then classified. Not a good test of the classifier, but a perfectly # good test of the POP3 proxy. The bodies of these came from the # spambayes project, and I added the headers myself because the # originals had no headers. spam1 = """From: friend@public.com Subject: Make money fast Hello tim_chandler , Want to save money ? Now is a good time to consider refinancing. Rates are low so you can cut your current payments and save money. http://64.251.22.101/interest/index%38%30%300%2E%68t%6D Take off list on site [s5] """ good1 = """From: chris@example.com Subject: ZPT and DTML Jean Jordaan wrote: > 'Fraid so ;> It contains a vintage dtml-calendar tag. > http://www.zope.org/Members/teyc/CalendarTag > > Hmm I think I see what you mean: one needn't manually pass on the > namespace to a ZPT? Yeah, Page Templates are a bit more clever, sadly, DTML methods aren't :-( Chris """ class TestListener(Dibbler.Listener): """Listener for TestPOP3Server. Works on port 8110, to co-exist with real POP3 servers.""" def __init__(self, socketMap=asyncore.socket_map): Dibbler.Listener.__init__(self, 8110, TestPOP3Server, (socketMap,), socketMap=socketMap) class TestPOP3Server(Dibbler.BrighterAsyncChat): """Minimal POP3 server, for testing purposes. Doesn't support UIDL. USER, PASS, APOP, DELE and RSET simply return "+OK" without doing anything. Also understands the 'KILL' command, to kill it. The mail content is the example messages above. """ def __init__(self, clientSocket, socketMap): # Grumble: asynchat.__init__ doesn't take a 'map' argument, # hence the two-stage construction. Dibbler.BrighterAsyncChat.__init__(self) Dibbler.BrighterAsyncChat.set_socket(self, clientSocket, socketMap) self.maildrop = [spam1, good1] self.set_terminator('\r\n') self.okCommands = ['USER', 'PASS', 'APOP', 'NOOP', 'DELE', 'RSET', 'QUIT', 'KILL'] self.handlers = {'STAT': self.onStat, 'LIST': self.onList, 'RETR': self.onRetr, 'TOP': self.onTop} self.push("+OK ready\r\n") self.request = '' def collect_incoming_data(self, data): """Asynchat override.""" self.request = self.request + data def found_terminator(self): """Asynchat override.""" if ' ' in self.request: command, args = self.request.split(None, 1) else: command, args = self.request, '' command = command.upper() if command in self.okCommands: self.push("+OK (we hope)\r\n") if command == 'QUIT': self.close_when_done() if command == 'KILL': self.socket.shutdown(2) self.close() raise SystemExit else: handler = self.handlers.get(command, self.onUnknown) self.push(handler(command, args)) # Or push_slowly for testing self.request = '' def push_slowly(self, response): """Useful for testing.""" for c in response: self.push(c) time.sleep(0.02) def onStat(self, command, args): """POP3 STAT command.""" maildropSize = reduce(operator.add, map(len, self.maildrop)) maildropSize += len(self.maildrop) * len(HEADER_EXAMPLE) return "+OK %d %d\r\n" % (len(self.maildrop), maildropSize) def onList(self, command, args): """POP3 LIST command, with optional message number argument.""" if args: try: number = int(args) except ValueError: number = -1 if 0 < number <= len(self.maildrop): return "+OK %d\r\n" % len(self.maildrop[number-1]) else: return "-ERR no such message\r\n" else: returnLines = ["+OK"] for messageIndex in range(len(self.maildrop)): size = len(self.maildrop[messageIndex]) returnLines.append("%d %d" % (messageIndex + 1, size)) returnLines.append(".") return '\r\n'.join(returnLines) + '\r\n' def _getMessage(self, number, maxLines): """Implements the POP3 RETR and TOP commands.""" if 0 < number <= len(self.maildrop): message = self.maildrop[number-1] headers, body = message.split('\n\n', 1) bodyLines = body.split('\n')[:maxLines] message = headers + '\r\n\r\n' + '\n'.join(bodyLines) return "+OK\r\n%s\r\n.\r\n" % message else: return "-ERR no such message\r\n" def onRetr(self, command, args): """POP3 RETR command.""" try: number = int(args) except ValueError: number = -1 return self._getMessage(number, 12345) def onTop(self, command, args): """POP3 RETR command.""" try: number, lines = map(int, args.split()) except ValueError: number, lines = -1, -1 return self._getMessage(number, lines) def onUnknown(self, command, args): """Unknown POP3 command.""" return "-ERR Unknown command: %s\r\n" % repr(command) def test(): """Runs a self-test using TestPOP3Server, a minimal POP3 server that serves the example emails above. """ # Run a proxy and a test server in separate threads with separate # asyncore environments. import threading state.isTest = True testServerReady = threading.Event() def runTestServer(): testSocketMap = {} TestListener(socketMap=testSocketMap) testServerReady.set() asyncore.loop(map=testSocketMap) proxyReady = threading.Event() def runUIAndProxy(): httpServer = UserInterfaceServer(8881) proxyUI = UserInterface() httpServer.register(proxyUI, OptionsConfigurator(proxyUI)) BayesProxyListener('localhost', 8110, 8111) state.bayes.learn(tokenizer.tokenize(spam1), True) state.bayes.learn(tokenizer.tokenize(good1), False) proxyReady.set() Dibbler.run() threading.Thread(target=runTestServer).start() testServerReady.wait() threading.Thread(target=runUIAndProxy).start() proxyReady.wait() # Connect to the proxy. proxy = socket.socket(socket.AF_INET, socket.SOCK_STREAM) proxy.connect(('localhost', 8111)) response = proxy.recv(100) assert response == "+OK ready\r\n" # Stat the mailbox to get the number of messages. proxy.send("stat\r\n") response = proxy.recv(100) count, totalSize = map(int, response.split()[1:3]) assert count == 2 # Loop through the messages ensuring that they have judgement # headers. for i in range(1, count+1): response = "" proxy.send("retr %d\r\n" % i) while response.find('\n.\r\n') == -1: response = response + proxy.recv(1000) assert response.find(options.hammie_header_name) >= 0 # Smoke-test the HTML UI. httpServer = socket.socket(socket.AF_INET, socket.SOCK_STREAM) httpServer.connect(('localhost', 8881)) httpServer.sendall("get / HTTP/1.0\r\n\r\n") response = '' while 1: packet = httpServer.recv(1000) if not packet: break response += packet assert re.search(r"(?s).*Spambayes proxy.*", response) # Kill the proxy and the test server. proxy.sendall("kill\r\n") proxy.recv(100) pop3Server = socket.socket(socket.AF_INET, socket.SOCK_STREAM) pop3Server.connect(('localhost', 8110)) pop3Server.sendall("kill\r\n") pop3Server.recv(100) # =================================================================== # __main__ driver. # =================================================================== def run(): # Read the arguments. try: opts, args = getopt.getopt(sys.argv[1:], 'htdbzp:l:u:') except getopt.error, msg: print >>sys.stderr, str(msg) + '\n\n' + __doc__ sys.exit() runSelfTest = False for opt, arg in opts: if opt == '-h': print >>sys.stderr, __doc__ sys.exit() elif opt == '-t': state.isTest = True state.runTestServer = True elif opt == '-b': state.launchUI = True elif opt == '-d': state.useDB = True elif opt == '-p': options.pop3proxy_persistent_storage_file = arg elif opt == '-l': state.proxyPorts = [_addressAndPort(arg)] elif opt == '-u': state.uiPort = int(arg) elif opt == '-z': state.isTest = True runSelfTest = True # Do whatever we've been asked to do... state.createWorkers() if runSelfTest: print "\nRunning self-test...\n" state.buildServerStrings() test() print "Self-test passed." # ...else it would have asserted. elif state.runTestServer: print "Running a test POP3 server on port 8110..." TestListener() asyncore.loop() elif 0 <= len(args) <= 2: # Normal usage, with optional server name and port number. if len(args) == 1: state.servers = [(args[0], 110)] elif len(args) == 2: state.servers = [(args[0], int(args[1]))] state.buildServerStrings() main(state.servers, state.proxyPorts, state.uiPort, state.launchUI) else: print >>sys.stderr, __doc__ if __name__ == '__main__': run() -------------- next part -------------- """ *Introduction* Dibbler is a Python web application framework. It lets you create web-based applications by writing independent plug-in modules that don't require any networking code. Dibbler takes care of the HTTP side of things, leaving you to write the application code. *Plugins and Methlets* Dibbler uses a system of plugins to implement the application logic. Each page maps to a 'methlet', which is a method of a plugin object that serves that page, and is named after the page it serves. The address `http://server/spam` calls the methlet `onSpam`. `onHome` is a reserved methlet name for the home page, `http://server/`. For resources that need a file extension (eg. images) you can use a URL such as `http://server/eggs.gif` to map to the `onEggsGif` methlet. All the registered plugins are searched for the appropriate methlet, so you can combine multiple plugins to build your application. A methlet needs to call `self.writeOKHeaders('text/html')` followed by `self.write(content)`. You can pass whatever content-type you like to `writeOKHeaders`, so serving images, PDFs, etc. is no problem. If a methlet wants to return an HTTP error code, it should call (for example) `self.writeError(403, "Forbidden")` instead of `writeOKHeaders` and `write`. If it wants to write its own headers (for instance to return a redirect) it can simply call `write` with the full HTTP response. If a methlet raises an exception, it is automatically turned into a "500 Server Error" page with a full traceback in it. *Parameters* Methlets can take parameters, the values of which are taken from form parameters submitted by the browser. So if your form says `
...` then your methlet should look like `def onSubscribe(self, email=None)`. It's good practice to give all the parameters default values, in case the user navigates to that URL without submitting a form, or submits the form without filling in any parameters. If you have lots of parameters, or their names are determined at runtime, you can define your methlet like this: `def onComplex(self, **params)` to get a dictionary of parameters. *Example* Here's a web application server that serves a calendar for a given year: >>> import Dibbler, calendar >>> class Calendar(Dibbler.HTTPPlugin): ... _form = '''

Calendar Server

... ... Year: ...
...
%s
''' ... ... def onHome(self, year=None): ... if year: ... result = calendar.calendar(int(year)) ... else: ... result = "" ... self.writeOKHeaders('text/html') ... self.write(self._form % result) ... >>> httpServer = Dibbler.HTTPServer(8888) >>> httpServer.register(Calendar()) >>> Dibbler.run(launchBrowser=True) Your browser will start, and you can ask for a calendar for the year of your choice. If you don't want to start the browser automatically, just call `run()` with no arguments - the application is available at http://localhost:8888/ . You'll have to kill the server manually because it provides no way to stop it; a real application would have some kind of 'shutdown' methlet that called `sys.exit()`. By combining Dibbler with an HTML manipulation library like PyMeld (shameless plug - see http://entrian.com/PyMeld for details) you can keep the HTML and Python code separate. *Building applications* You can run several plugins together like this: >>> httpServer = Dibbler.HTTPServer() >>> httpServer.register(plugin1, plugin2, plugin3) >>> Dibbler.run() ...so many plugin objects, each implementing a different set of pages, can cooperate to implement a web application. See also the `HTTPServer` documentation for details of how to run multiple `Dibbler` environments simultaneously in different threads. *Controlling connections* There are times when your code needs to be informed the moment an incoming connection is received, before any HTTP conversation begins. For instance, you might want to only accept connections from `localhost` for security reasons. If this is the case, your plugin should implement the `onIncomingConnection` method. This will be passed the incoming socket before any reads or writes have taken place, and should return True to allow the connection through or False to reject it. Here's an implementation of the `localhost`-only idea: >>> def onIncomingConnection(self, clientSocket): >>> return clientSocket.getpeername()[0] == clientSocket.getsockname()[0] *Advanced usage: Dibbler Contexts* If you want to run several independent Dibbler environments (in different threads for example) then each should use its own `Context`. Normally you'd say something like: >>> httpServer = Dibbler.HTTPServer() >>> httpServer.register(MyPlugin()) >>> Dibbler.run() but that's only safe to do from one thread. Instead, you can say: >>> myContext = Dibbler.Context() >>> httpServer = Dibbler.HTTPServer(context=myContext) >>> httpServer.register(MyPlugin()) >>> Dibbler.run(myContext) in as many threads as you like. *Dibbler and asyncore* If this section means nothing to you, you can safely ignore it. Dibbler is built on top of Python's asyncore library, which means that it integrates into other asyncore-based applications, and you can write other asyncore-based components and run them as part of the same application. By default, Dibbler uses the default asyncore socket map. This means that `Dibbler.run()` also runs your asyncore-based components, provided they're using the default socket map. If you want to tell Dibbler to use a different socket map, either to co-exist with other asyncore-based components using that map or to insulate Dibbler from such components by using a different map, you need to use a `Dibbler.Context`. If you're using your own socket map, give it to the context: `context = Dibbler.Context(myMap)`. If you want Dibbler to use its own map: `context = Dibbler.Context({})`. You can either call `Dibbler.run(context)` to run the async loop, or call `asyncore.loop()` directly - the only difference is that the former has a few more options, like launching the web browser automatically. *Self-test* Running `Dibbler.py` directly as a script runs the example calendar server plus a self-test. """ # Dibbler is released under the Python Software Foundation license; see # http://www.python.org/ __author__ = "Richie Hindle " __credits__ = "Tim Stone" try: import cStringIO as StringIO except ImportError: import StringIO import os, sys, re, time, traceback import socket, asyncore, asynchat, cgi, urlparse, webbrowser try: True, False except NameError: # Maintain compatibility with Python 2.2 True, False = 1, 0 class BrighterAsyncChat(asynchat.async_chat): """An asynchat.async_chat that doesn't give spurious warnings on receiving an incoming connection, lets SystemExit cause an exit, can flush its output, and will correctly remove itself from a non-default socket map on `close()`.""" def __init__(self, conn=None, map=None): """See `asynchat.async_chat`.""" asynchat.async_chat.__init__(self, conn) self._map = map def handle_connect(self): """Suppresses the asyncore "unhandled connect event" warning.""" pass def handle_error(self): """Let SystemExit cause an exit.""" type, v, t = sys.exc_info() if type == socket.error and v[0] == 9: # Why? Who knows... pass elif type == SystemExit: raise else: asynchat.async_chat.handle_error(self) def flush(self): """Flush everything in the output buffer.""" while self.producer_fifo or self.ac_out_buffer: self.initiate_send() def close(self): """Remove this object from the correct socket map.""" self.del_channel(self._map) self.socket.close() class Context: """See the main documentation for details of `Dibbler.Context`.""" def __init__(self, asyncMap=asyncore.socket_map): self._HTTPPort = None # Stores the port for `run(launchBrowser=True)` self._map = asyncMap _defaultContext = Context() class Listener(asyncore.dispatcher): """Generic listener class used by all the different types of server. Listens for incoming socket connections and calls a factory function to create handlers for them.""" def __init__(self, port, factory, factoryArgs, socketMap=_defaultContext._map): """Creates a listener object, which will listen for incoming connections when Dibbler.run is called: o port: The TCP/IP (address, port) to listen on. Usually '' - meaning bind to all IP addresses that the machine has - will be passed as the address. For backwards interface compatibility, if port is just an int, an address of '' will be assumed. o factory: The function to call to create a handler (can be a class name). o factoryArgs: The arguments to pass to the handler factory. For proper context support, this should include a `context` argument (or a `socketMap` argument for pure asyncore listeners). The incoming socket will be prepended to this list, and passed as the first argument. See `HTTPServer` for an example. o socketMap: Optional. The asyncore socket map to use. If you're using a `Dibbler.Context`, pass context._map. See `HTTPServer` for an example `Listener` - it's a good deal smaller than this description!""" asyncore.dispatcher.__init__(self, map=socketMap) self.socketMap = socketMap self.factory = factory self.factoryArgs = factoryArgs s = socket.socket(socket.AF_INET, socket.SOCK_STREAM) s.setblocking(False) self.set_socket(s, self.socketMap) self.set_reuse_addr() if type(port) != type(()): port = ('', port) self.bind(port) self.listen(5) def handle_accept(self): """Asyncore override.""" # If an incoming connection is instantly reset, eg. by following a # link in the web interface then instantly following another one or # hitting stop, handle_accept() will be triggered but accept() will # return None. result = self.accept() if result: clientSocket, clientAddress = result args = [clientSocket] + list(self.factoryArgs) self.factory(*args) class HTTPServer(Listener): """A web server with which you can register `HTTPPlugin`s to serve up your content - see `HTTPPlugin` for detailed documentation and examples. `port` specifies the TCP/IP (address, port) on which to run, defaulting to ('', 80). `context` optionally specifies a `Dibbler.Context` for the server. """ def __init__(self, port=('', 80), context=_defaultContext): """Create an `HTTPServer` for the given port.""" Listener.__init__(self, port, _HTTPHandler, (self, context), context._map) self._plugins = [] context._HTTPPort = port def register(self, *plugins): """Registers one or more `HTTPPlugin`-derived objects with the server.""" for plugin in plugins: self._plugins.append(plugin) class _HTTPHandler(BrighterAsyncChat): """This is a helper for the HTTP server class - one of these is created for each incoming request, and does the job of decoding the HTTP traffic and driving the plugins.""" def __init__(self, clientSocket, server, context): # Grumble: asynchat.__init__ doesn't take a 'map' argument, # hence the two-stage construction. BrighterAsyncChat.__init__(self, map=context._map) BrighterAsyncChat.set_socket(self, clientSocket, context._map) self._server = server self._request = '' self.set_terminator('\r\n\r\n') # Because a methlet is likely to call `writeOKHeaders` before doing # anything else, an unexpected exception won't send back a 500, which # is poor. So we buffer any sent headers until either a plain `write` # happens or the methlet returns. self._bufferedHeaders = [] self._headersWritten = False # Tell the plugins about the connection, letting them veto it. for plugin in self._server._plugins: if not plugin.onIncomingConnection(clientSocket): self.close() def collect_incoming_data(self, data): """Asynchat override.""" self._request = self._request + data def found_terminator(self): """Asynchat override.""" # Parse the HTTP request. requestLine, headers = (self._request+'\r\n').split('\r\n', 1) try: method, url, version = requestLine.strip().split() except ValueError: self.pushError(400, "Malformed request: '%s'" % requestLine) self.close_when_done() return # Parse the URL, and deal with POST vs. GET requests. method = method.upper() unused, unused, path, unused, query, unused = urlparse.urlparse(url) cgiParams = cgi.parse_qs(query, keep_blank_values=True) if self.get_terminator() == '\r\n\r\n' and method == 'POST': # We need to read the body - set a numeric async_chat terminator # equal to the Content-Length. match = re.search(r'(?i)content-length:\s*(\d+)', headers) contentLength = int(match.group(1)) if contentLength > 0: self.set_terminator(contentLength) self._request = self._request + '\r\n\r\n' return # Have we just read the body of a POSTed request? Decode the body, # which will contain parameters and possibly uploaded files. if type(self.get_terminator()) is type(1): self.set_terminator('\r\n\r\n') body = self._request.split('\r\n\r\n', 1)[1] match = re.search(r'(?i)content-type:\s*([^\r\n]+)', headers) contentTypeHeader = match.group(1) contentType, pdict = cgi.parse_header(contentTypeHeader) if contentType == 'multipart/form-data': # multipart/form-data - probably a file upload. bodyFile = StringIO.StringIO(body) cgiParams.update(cgi.parse_multipart(bodyFile, pdict)) else: # A normal x-www-form-urlencoded. cgiParams.update(cgi.parse_qs(body, keep_blank_values=True)) # Convert the cgi params into a simple dictionary. params = {} for name, value in cgiParams.iteritems(): params[name] = value[0] # Find and call the methlet. '/eggs.gif' becomes 'onEggsGif'. if path == '/': path = '/Home' pieces = path[1:].split('.') name = 'on' + ''.join([piece.capitalize() for piece in pieces]) for plugin in self._server._plugins: if hasattr(plugin, name): # The plugin's APIs (`write`, etc) reflect back to us via # `plugin._handler`. plugin._handler = self try: # Call the methlet. getattr(plugin, name)(**params) if self._bufferedHeaders: # The methlet returned without writing anything other # than headers. This isn't unreasonable - it might # have written a 302 or something. Flush the buffered # headers self.write(None) except: # The methlet raised an exception - send the traceback to # the browser, unless it's SystemExit in which case we let # it go. eType, eValue, eTrace = sys.exc_info() if eType == SystemExit: ##self.shutdown(2) raise message = """

500 Server error

%s
""" details = traceback.format_exception(eType, eValue, eTrace) details = '\n'.join(details) self.writeError(500, message % cgi.escape(details)) plugin._handler = None break else: self.onUnknown(path, params) # `close_when_done` and `Connection: close` ensure that we don't # support keep-alives or pipelining. There are problems with some # browsers, for instance with extra characters being appended after # the body of a POSTed request. self.close_when_done() def onUnknown(self, path, params): """Handler for unknown URLs. Returns a 404 page.""" self.writeError(404, "Not found: '%s'" % path) def writeOKHeaders(self, contentType, extraHeaders={}): """Reflected from `HTTPPlugin`s.""" # Buffer the headers until there's a `write`, in case an error occurs. timeNow = time.gmtime(time.time()) httpNow = time.strftime('%a, %d %b %Y %H:%M:%S GMT', timeNow) headers = [] headers.append("HTTP/1.1 200 OK") headers.append("Connection: close") headers.append("Content-Type: %s" % contentType) headers.append("Date: %s" % httpNow) for name, value in extraHeaders.items(): headers.append("%s: %s" % (name, value)) headers.append("") headers.append("") self._bufferedHeaders = headers def writeError(self, code, message): """Reflected from `HTTPPlugin`s.""" # Writing an error overrides any buffered headers, but obviously # doesn't want to write any headers if some have already gone. headers = [] if not self._headersWritten: headers.append("HTTP/1.0 %d Error" % code) headers.append("Connection: close") headers.append("Content-Type: text/html") headers.append("") headers.append("") self.push("%s%s" % \ ('\r\n'.join(headers), message)) def write(self, content): """Reflected from `HTTPPlugin`s.""" # The methlet is writing, so write any buffered headers first. headers = [] if self._bufferedHeaders: headers = self._bufferedHeaders self._bufferedHeaders = None self._headersWritten = True # `write(None)` just flushes buffered headers. if content is None: content = '' self.push('\r\n'.join(headers) + str(content)) class HTTPPlugin: """Base class for HTTP server plugins. See the main documentation for details.""" def __init__(self): # self._handler is filled in by `HTTPHandler.found_terminator()`. pass def onIncomingConnection(self, clientSocket): """Implement this and return False to veto incoming connections.""" return True def writeOKHeaders(self, contentType, extraHeaders={}): """A methlet should call this with the Content-Type and optionally a dictionary of extra headers (eg. Expires) before calling `write()`.""" return self._handler.writeOKHeaders(contentType, extraHeaders) def writeError(self, code, message): """A methlet should call this instead of `writeOKHeaders()` / `write()` to report an HTTP error (eg. 403 Forbidden).""" return self._handler.writeError(code, message) def write(self, content): """A methlet should call this after `writeOKHeaders` to write the page's content.""" return self._handler.write(content) def flush(self): """A methlet can call this after calling `write`, to ensure that the content is written immediately to the browser. This isn't necessary most of the time, but if you're writing "Please wait..." before performing a long operation, calling `flush()` is a good idea.""" return self._handler.flush() def close(self, flush=True): """Closes the connection to the browser. You should call `close()` before calling `sys.exit()` in any 'shutdown' methlets you write.""" if flush: self.flush() return self._handler.close() def run(launchBrowser=False, context=_defaultContext): """Runs a `Dibbler` application. Servers listen for incoming connections and route requests through to plugins until a plugin calls `sys.exit()` or raises a `SystemExit` exception.""" if launchBrowser: webbrowser.open_new("http://localhost:%d/" % context._HTTPPort) asyncore.loop(map=context._map) def runTestServer(readyEvent=None): """Runs the calendar server example, with an added `/shutdown` URL.""" import Dibbler, calendar class Calendar(Dibbler.HTTPPlugin): _form = '''

Calendar Server

Year:
%s
''' def onHome(self, year=None): if year: result = calendar.calendar(int(year)) else: result = "" self.writeOKHeaders('text/html') self.write(self._form % result) def onShutdown(self): self.writeOKHeaders('text/html') self.write("

OK.

") self.close() sys.exit() httpServer = Dibbler.HTTPServer(8888) httpServer.register(Calendar()) if readyEvent: # Tell the self-test code that the test server is up and running. readyEvent.set() Dibbler.run(launchBrowser=True) def test(): """Run a self-test.""" # Run the calendar server in a separate thread. import re, threading, urllib testServerReady = threading.Event() threading.Thread(target=runTestServer, args=(testServerReady,)).start() testServerReady.wait() # Connect to the server and ask for a calendar. page = urllib.urlopen("http://localhost:8888/?year=2003").read() if page.find('January') != -1: print "Self test passed." else: print "Self-test failed!" # Wait for a key while the user plays with his browser. raw_input("Press any key to shut down the application server...") # Ask the server to shut down. page = urllib.urlopen("http://localhost:8888/shutdown").read() if page.find('OK') != -1: print "Shutdown OK." else: print "Shutdown failed!" if __name__ == '__main__': test() -------------- next part -------------- Skipped content of type multipart/appledouble From noreply at sourceforge.net Mon Jan 20 05:13:16 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Mon Jan 20 10:02:33 2003 Subject: [Spambayes] [ spambayes-Patches-670417 ] Allow the pop3 proxies to bind to specific addresses Message-ID: Patches item #670417, was opened at 2003-01-18 21:06 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498105&aid=670417&group_id=61702 Category: None Group: None Status: Open Resolution: None Priority: 5 Submitted By: Tony Lownds (tonylownds) Assigned to: Richie Hindle (richiehindle) Summary: Allow the pop3 proxies to bind to specific addresses Initial Comment: This patch allows one to specify an IP address when specifying a port in the pop3proxy_ports setting. This is useful for two reasons: 1. By binding to a loopback address, the pop3proxy cannot be contacted from outside machines. Providing this option improves security. 2. The mail client Eudora - which is quite popular - is unable to specify a different POP port for different POP accounts. This patch alllows Eudora to be used with spambayes with multiple POP accounts. The implementation is fairly straightforward: any place a port was passed for binding, a pair of (address, port) is passed. In the two places a port was read (from a configuration file and from command line options), either an int or an address:int is accepted. Any place a port was turned into a string for printing, the (address, port) pair is turned into a suitable string. ---------------------------------------------------------------------- Comment By: François Granger (fgranger) Date: 2003-01-20 14:13 Message: Logged In: YES user_id=86948 I asked Tony about this, he sent me the files. Can I upload them or forward them to you ? ---------------------------------------------------------------------- Comment By: Richie Hindle (richiehindle) Date: 2003-01-20 12:35 Message: Logged In: YES user_id=85414 Has SourceForge eaten the patch file? It says "No Files Currently Attached". ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498105&aid=670417&group_id=61702 From noreply at sourceforge.net Mon Jan 20 06:28:09 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Mon Jan 20 10:02:49 2003 Subject: [Spambayes] [ spambayes-Patches-670417 ] Allow the pop3 proxies to bind to specific addresses Message-ID: Patches item #670417, was opened at 2003-01-18 20:06 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498105&aid=670417&group_id=61702 Category: None Group: None Status: Open Resolution: None Priority: 5 Submitted By: Tony Lownds (tonylownds) Assigned to: Richie Hindle (richiehindle) Summary: Allow the pop3 proxies to bind to specific addresses Initial Comment: This patch allows one to specify an IP address when specifying a port in the pop3proxy_ports setting. This is useful for two reasons: 1. By binding to a loopback address, the pop3proxy cannot be contacted from outside machines. Providing this option improves security. 2. The mail client Eudora - which is quite popular - is unable to specify a different POP port for different POP accounts. This patch alllows Eudora to be used with spambayes with multiple POP accounts. The implementation is fairly straightforward: any place a port was passed for binding, a pair of (address, port) is passed. In the two places a port was read (from a configuration file and from command line options), either an int or an address:int is accepted. Any place a port was turned into a string for printing, the (address, port) pair is turned into a suitable string. ---------------------------------------------------------------------- >Comment By: Richie Hindle (richiehindle) Date: 2003-01-20 14:28 Message: Logged In: YES user_id=85414 If you can't upload them here, please email them to me. Thanks. ---------------------------------------------------------------------- Comment By: François Granger (fgranger) Date: 2003-01-20 13:13 Message: Logged In: YES user_id=86948 I asked Tony about this, he sent me the files. Can I upload them or forward them to you ? ---------------------------------------------------------------------- Comment By: Richie Hindle (richiehindle) Date: 2003-01-20 11:35 Message: Logged In: YES user_id=85414 Has SourceForge eaten the patch file? It says "No Files Currently Attached". ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498105&aid=670417&group_id=61702 From skip at pobox.com Mon Jan 20 09:06:28 2003 From: skip at pobox.com (Skip Montanaro) Date: Mon Jan 20 10:06:39 2003 Subject: [Spambayes] Change Required To pspam/options.py In-Reply-To: <3E2C0193.3040109@pa.press.net> References: <3E2C0193.3040109@pa.press.net> Message-ID: <15916.4212.309598.216695@montanaro.dyndns.org> John> from Options ... John> needs changing to John> from spambayes.Options ... Fix checked in. Skip From sjoerd at acm.org Mon Jan 20 16:23:49 2003 From: sjoerd at acm.org (Sjoerd Mullender) Date: Mon Jan 20 10:23:55 2003 Subject: [Spambayes] locking pickle/dbm against concurrent access? In-Reply-To: <15916.3865.297629.696625@montanaro.dyndns.org> References: <15916.3865.297629.696625@montanaro.dyndns.org> Message-ID: <20030120152349.0855F74247@indus.ins.cwi.nl> On Mon, Jan 20 2003 Skip Montanaro wrote: > > Depending on how training and classifying are accomplished, it's quite > possible that the two activities will be done in different processes. For > example, I am currently experimenting with training using pop3proxy (well, > still my offshoot proxytrainer at the moment) while classification is being > done by hammiefilter run from procmail. This implies a need to lock the > shelve/pickle file used to store the training info. Seems to me we need to > (be able to) lock the shelve/pickle file. The only lock facility which > seems cross-platform enough for this application is the set of flags used by > os.open(). To lock the database you'd have to check/create a lock file > related (namewise) to the actual database file. Has anyone given this any > thought? I use the following code in my programs. Programs start with creating an instance of this class, and end by calling the close method. As far as I know, the safest way to do locking if you also have NFS partitions is to try to link to the lock file, so that is the technique I use. import os, time import spambayes.Options import spambayes.hammie class error(Exception): pass class HammieFilter(object): def __init__(self): dbname = spambayes.Options.options.hammiefilter_persistent_storage_file dbname = os.path.expanduser(dbname) usedb = spambayes.Options.options.hammiefilter_persistent_use_database tmplock = '%s.lock%d' % (dbname, os.getpid()) self.lockfile = '%s.lock' % dbname open(tmplock, 'w').close() for i in range(5): if i > 0: time.sleep(5) try: os.link(tmplock, self.lockfile) except OSError: pass else: break else: os.unlink(tmplock) raise error, 'Database locked' os.unlink(tmplock) self.hammie = spambayes.hammie.open(dbname, usedb, 'c') def train(self, msg, is_spam): self.hammie.train(msg, is_spam) def untrain(self, msg, is_spam): self.hammie.untrain(msg, is_spam) def score(self, msg, evidence = False): return self.hammie.score(msg, evidence) def close(self): self.hammie.store() os.unlink(self.lockfile) -- Sjoerd Mullender From seant at webreply.com Mon Jan 20 11:08:05 2003 From: seant at webreply.com (Sean True) Date: Mon Jan 20 11:11:30 2003 Subject: [Spambayes] Spamconference.org Message-ID: I attended the spam conference on Friday. Barry mentioned spambayes as part the global Mailman picture. It was a good talk, in general. http://www.spamconference.org has links to a copy of the webcasts. Worth listening to. There are also abstracts. There are eventually going to be complete papers. -- Sean From skip at pobox.com Mon Jan 20 10:43:39 2003 From: skip at pobox.com (Skip Montanaro) Date: Mon Jan 20 11:43:48 2003 Subject: [Spambayes] nothing gets updated Message-ID: <15916.10043.564368.750299@montanaro.dyndns.org> I noticed that when training via my proxytrainer the shelf file isn't getting modified - at least its 'saved state' key doesn't change. I also noticed that it seems to be taking longer and longer to complete the operation after I click the 'train' button. I'd like to switch back to pop3proxy and start folding my user interface changes into it, but I still have trouble running it. I just tried running it as python pop3proxy.py -d -p hammie.db It started okay, except I got a warning about xmllib: /Users/skip/local/lib/python2.3/xmllib.py:10: DeprecationWarning: The xmllib module is obsolete. Use xml.sax instead. DeprecationWarning) and when I tried to visit http://localhost:8880/ I get a 500 Server error message: Traceback (most recent call last): File "spambayes/Dibbler.py", line 389, in found_terminator getattr(plugin, name)(**params) File "pop3proxy.py", line 619, in onHome 'status.gif', statusTable % stateDict)+ File "spambayes/PyMeldLite.py", line 618, in __getattr__ return self.__dict__[name] KeyError: '__coerce__' Looking through the code in both pop3proxy and proxytrainer, I see calls to self._doSave() or self.doSave() at the end of onReview(), but all they do is call self.bayes.store(). Where is the actual decision about a message's status translated into a change in the state of the database, either in-memory or on-disk? Skip From richie at entrian.com Mon Jan 20 20:26:22 2003 From: richie at entrian.com (Richie Hindle) Date: Mon Jan 20 15:27:05 2003 Subject: [Spambayes] Follow up In-Reply-To: References: Message-ID: [Fran?ois] > MacOS X create in each directory a file named ".DS_Store" for it own > uses. Since it is a hidden file, there is no issue with most > software. But pop3proxy loads it as if it was a normal message file. Oops. Now fixed - thanks. Incidentally, unexpected exceptions raised by the web UI should now give you the exception and a traceback in your browser. -- Richie Hindle richie@entrian.com From richie at entrian.com Mon Jan 20 20:26:41 2003 From: richie at entrian.com (Richie Hindle) Date: Mon Jan 20 15:27:14 2003 Subject: [Spambayes] locking pickle/dbm against concurrent access? In-Reply-To: <15916.3865.297629.696625@montanaro.dyndns.org> References: <15916.3865.297629.696625@montanaro.dyndns.org> Message-ID: <11mo2vsri5vjvio62irbkq3ihcjndb0p9k@4ax.com> [Skip] > Depending on how training and classifying are accomplished, it's quite > possible that the two activities will be done in different processes. For what it's worth, this is one of the reasons I'm keen to keep all server components within one process, and using asyncore - all concurrency issues are taken care of automatically. It's probably overkill for our application, but if hammie could classify by talking to the web UI, just like your proxytee.py script does, we could use the server as the concurrency mechanism. Pure hammie users wouldn't need a server (probably, depending on how they train). This is how most relational databases do it, after all... -- Richie Hindle richie@entrian.com From richie at entrian.com Mon Jan 20 20:27:24 2003 From: richie at entrian.com (Richie Hindle) Date: Mon Jan 20 15:27:54 2003 Subject: [Spambayes] Re: proxytrainer.py Message-ID: [Skip] > this will probably quickly deteriorate into a matter of personal taste > and display properties You're right. 8-) Your striping looks too dark to me on my monitor - I have #f4f4f4, you have #dddddd; can we compromise on #e8e8e8? That looks a fraction too dark to me, and will probably look a fraction too light for you. You've made each radio button line up directly under its heading; I deliberately hadn't done that. It looks nicer at first glance that way, but I've found (through extensive usability testing 8-) that it's easier to use when the radio buttons are physically closer together. They still lay out sensibly (in my environment at least - was the layout bad for you, or did you just want each button directly under the heading?) I like the fact that you can view the messages - that's been on my to-do list for ages! And pre-classifying messages in the Unsure list is another good idea - nice one. > Looks like it's time to backport them to pop3proxy.py. Yes please... having both is confusing to users and a pain for developers. If you'd like us to share the work, let me know - the way the HTML is built has changed dramatically, for instance, so I could do those bits (though I still prefer my way of laying out the radio buttons...) -- Richie From richie at entrian.com Mon Jan 20 20:30:16 2003 From: richie at entrian.com (Richie Hindle) Date: Mon Jan 20 15:30:43 2003 Subject: [Spambayes] proxytrainer.py and proxytee.py are checked in In-Reply-To: <15912.28278.137619.916136@montanaro.dyndns.org> References: <15910.61389.133887.569308@montanaro.dyndns.org> <15912.28278.137619.916136@montanaro.dyndns.org> Message-ID: [Richie] > You should really make your message-naming code use the same > system as everything else [Skip] > Wasn't aware I did anything differently than you. Did you notice something? It looks like you've introduced self.messageName, which you increment each time you receive a message. I base all message names on time.time(), with a uniquifier appended if two arrive within one clock tick of each other - see onRetr(). [Skip] > I think as important (or more important) than day-by-day display is > chunk-by-chunk display. I get far too much mail to want to review it all at > once anyway. If I can't take the time to train everything, I don't want to > be depressed about it. ;-) Fair enough. I find that one day per chunk makes sense, and even if I get a hundred messages in a day, once the system's been trained it's still very quick to cast my eye down the list and correct any mistakes. [Richie] > If I can persuade you to use pop3proxy (or its successor, a > generic Spambayes server that can optionally host either or both > of the web UI and the POP3 proxy), you won't need to pull out > the async stuff. [Skip] > That's fine. My only worry is that the async code will never be as well > exercised as SimpleHTTPServer. Maybe, I don't know. As far as I know there have never been any async-related problems with pop3proxy, and I've used it successfully in my day job. -- Richie Hindle richie@entrian.com From richie at entrian.com Mon Jan 20 20:31:20 2003 From: richie at entrian.com (Richie Hindle) Date: Mon Jan 20 15:31:48 2003 Subject: [Spambayes] OptionConfig.py - split into two pieces? In-Reply-To: <15912.33776.619638.320031@montanaro.dyndns.org> References: <15912.33776.619638.320031@montanaro.dyndns.org> Message-ID: [Skip] > Perhaps [OptionsConfig] should be split in two pieces, a script and > an importable module. Is there any need to keep the script, now that it's a part of pop3proxy.py and you can run pop3proxy.py without any proxies configured? Does anyone have a reason for keeping OptionsConfig as a standalone script? -- Richie Hindle richie@entrian.com From neale at woozle.org Mon Jan 20 13:28:40 2003 From: neale at woozle.org (Neale Pickett) Date: Mon Jan 20 16:28:49 2003 Subject: [Spambayes] spampot -- spam honeypot server Message-ID: So the spam conference was great, etc. etc. The best thing was that I met a bunch of intersting people. From talking with folks, it sounds like my spampot program might be of interest to the general public. Spampot is basically Jackpot, but written in Python. Right now I'm sure Jackpot does more than Spampot does. But I'm not sure Jackpot saves any of the messages it traps, and I have a feeling spampot will run on more platforms. For those unfamiliar with Jackpot, it comes up looking like an SMTP server, and will relay messages it thinks are probe tests. Everything else just goes to the bit bucket. With spampot, 5% of the incoming spam is saved to disk so you can look at it later. This is of critical importance to anyone who's writing a spam filter, because this way you get pure unadulterated spam as it would come in to your SMTP server. Contrast this with something like SpamArchive, where you get all sorts of messages of variying quality, forwarded and sullied by who knows what. The first night I ran spampot on an IP with no DNS entry associated with it, I got a probe after four hours. After I'd fixed the probe relaying logic to relay that type of probe, it took all of ten hours for me to collect over 400MB of spam. If people are interested enough in this, I'll make a separate mail list and web page for it. But for now it's available at . Happy hacking. Neale From skip at pobox.com Mon Jan 20 15:41:06 2003 From: skip at pobox.com (Skip Montanaro) Date: Mon Jan 20 16:41:15 2003 Subject: [Spambayes] spampot -- spam honeypot server In-Reply-To: References: Message-ID: <15916.27890.874021.624060@montanaro.dyndns.org> Neale, Hopefully I won't sound too much like an idiot, but what's a "probe message"? How do you classify messages which come into spampot, just "probe message" and "everything else"? Skip From richie at entrian.com Mon Jan 20 21:48:20 2003 From: richie at entrian.com (Richie Hindle) Date: Mon Jan 20 16:49:03 2003 Subject: [Spambayes] [ spambayes-Patches-670417 ] In-Reply-To: References: Message-ID: [Fran?ois] > bind_address.patch Great - thanks. I'll look at it as soon as I get the chance. -- Richie Hindle richie@entrian.com From richie at entrian.com Mon Jan 20 22:18:31 2003 From: richie at entrian.com (Richie Hindle) Date: Mon Jan 20 17:19:07 2003 Subject: [Spambayes] nothing gets updated In-Reply-To: <15916.10043.564368.750299@montanaro.dyndns.org> References: <15916.10043.564368.750299@montanaro.dyndns.org> Message-ID: [Skip] > I'd like to switch back to pop3proxy and start folding my user interface > changes into it, but I still have trouble running it > [...] > DeprecationWarning > [...] > KeyError: '__coerce__' Looks like I need to test this with 2.3a1... I'm downloading it now. > Looking through the code in both pop3proxy and proxytrainer, I see calls to > self._doSave() or self.doSave() at the end of onReview(), but all they do is > call self.bayes.store(). Where is the actual decision about a message's > status translated into a change in the state of the database, either > in-memory or on-disk? In pop3proxy, the training (ie. the calling of Classifier.learn()) is done when messages are moved from the Unknown corpus to one of the Ham or Spam corpuses. This code: # Create the Trainers. self.spamTrainer = storage.SpamTrainer(self.bayes) self.hamTrainer = storage.HamTrainer(self.bayes) self.spamCorpus.addObserver(self.spamTrainer) self.hamCorpus.addObserver(self.hamTrainer) sets up trainers which automatically train the classifier when messages are moved between corpuses using Corpus.takeMessage(), which is called by onReview(). This code is missing from proxytrainer.py - overzealous code trimming? 8-) -- Richie Hindle richie@entrian.com From ducky at webfoot.com Mon Jan 20 14:42:34 2003 From: ducky at webfoot.com (Kaitlin Duck Sherwood) Date: Mon Jan 20 17:47:22 2003 Subject: [Spambayes] (anti-)spam conference trip report In-Reply-To: References: <15916.10043.564368.750299@montanaro.dyndns.org> Message-ID: For those who couldn't make it to the (anti-)spam conference in Boston last week, I posted a (long) trip report at http://www.overcomeemailoverload.com./antispam/2003SpamConfNotes.html No reply needed. From neale at woozle.org Mon Jan 20 15:16:00 2003 From: neale at woozle.org (Neale Pickett) Date: Mon Jan 20 18:16:04 2003 Subject: [Spambayes] spampot -- spam honeypot server In-Reply-To: <15916.27890.874021.624060@montanaro.dyndns.org> (Skip Montanaro's message of "Mon, 20 Jan 2003 15:41:06 -0600") References: <15916.27890.874021.624060@montanaro.dyndns.org> Message-ID: Skip Montanaro writes: > Neale, > > Hopefully I won't sound too much like an idiot, but what's a "probe > message"? How do you classify messages which come into spampot, just > "probe message" and "everything else"? So when you kick up a mail server, you'll get a lot of messages like this: SMTP-Hello: master-cv7889w2 SMTP-Mail-From: SMTP-Rcpt-To: From: china9988@21cn.com Subject: 192.168.1.2 To: china9988@21cn.com Date: Thu, 16 Jan 2003 21:48:41 +0900 X-Priority: 3 X-Library: Indy 8.0.25 t_Smtp.LocalIP This is one of the more baffling probes, since china9988@21cn.com gives NDRs--maybe really old spam software. But all of the probes I've seen so far have the IP address of my honeypot sever in the subject line. It makes sense--send out mail blindly, and anything you get back has the IP address of an open relay in the subject line. And yes, currently I only classify as "probe" and "everything else". I do this with Maildir flags, though there's really no reason why it should have to be in Maildir format, aside from making it easy to view with mutt. Right now my probe detection logic needs work :) Neale From jdhunter at ace.bsd.uchicago.edu Mon Jan 20 17:19:29 2003 From: jdhunter at ace.bsd.uchicago.edu (John Hunter) Date: Mon Jan 20 18:19:12 2003 Subject: [Spambayes] spambayes with gnus/nnml Message-ID: I am currently using gnus to split my incoming mail using the nnml backend. It splits the mail in /var/spool/mail/jdhunter into my personal, professional, mailing list dirs. After I have read my mail, I periodically sort it into archival directories, typically by sender, with the exception of mailing lists, which are already split directly into their final resting place As such, after gnus does the split, my inbox looks like (nnml files are one file per mail named as integers, and denoted by [0-9]+ below) Mail/inbox1/[0-9]+ Mail/inbox2/[0-9]+ Mail/inbox3/[0-9]+ Mail/mail-list/list1/[0-9]+ Mail/mail-list/list2/[0-9]+ Mail/mail-list/list3//[0-9]+ Typically, I'll move my inbox1-n mail into archive folders every few days Mail/spam/[0-9]+ Mail/personal/sender1/[0-9]+ Mail/personal/sender2/[0-9]+ Mail/prof/sender1/[0-9]+ Mail/prof/sender2/[0-9]+ Mail/biz/sender1/[0-9]+ Mail/biz/sender2/[0-9]+ Mail/biz/sendern/[0-9]+ I have a lot of mail-list and archival folders, and am adding new ones all the time. My inbox folders, however, are fairly static. I am wondering how to best integrate spambayes, since my spam split regexes are no longer keeping up. What I would like is for spam bayes to prefilter my mail, siphoning off spam to a spam folder, and putting the rest in /var/spool/mail/jdhunter (or some other file that I can advise gnus to check) where I can then use gnus to split it. But I am not sure which dirs I should advise hammiefilter.py to monitor so it can retrain itself. Are any of you combining spambayes with gnus splitting? The notes in HAMMIE.txt suggest that hammiefilter expects mbox files. Thanks, John Hunter From richie at entrian.com Mon Jan 20 23:24:33 2003 From: richie at entrian.com (Richie Hindle) Date: Mon Jan 20 18:25:02 2003 Subject: [Spambayes] nothing gets updated In-Reply-To: References: <15916.10043.564368.750299@montanaro.dyndns.org> Message-ID: <881p2vkng1maek065f3vcadjbho9ga4p0i@4ax.com> > [Skip] > DeprecationWarning > [...] > KeyError: '__coerce__' The KeyError problem is fixed, and the DeprecationWarning is suppressed for now. -- Richie Hindle richie@entrian.com From piersh at friskit.com Mon Jan 20 15:49:43 2003 From: piersh at friskit.com (Piers Haken) Date: Mon Jan 20 18:35:05 2003 Subject: [Spambayes] Outlook plugin and hotmail Message-ID: <9891913C5BFE87429D71E37F08210CB92C7495@zeus.sfhq.friskit.com> I'm not sure why but it looks like filtering of Hotmail inboxes has recently been broken. Here's a stack trace: pythoncom error: Python error invoking COM method. Traceback (most recent call last): File "C:\Python22\lib\site-packages\win32com\server\policy.py", line 275, in _Invoke_ return self._invoke_(dispid, lcid, wFlags, args) File "C:\Python22\lib\site-packages\win32com\server\policy.py", line 280, in _invoke_ return S_OK, -1, self._invokeex_(dispid, lcid, wFlags, args, None, None) File "C:\Python22\lib\site-packages\win32com\server\policy.py", line 562, in _invokeex_ return DesignatedWrapPolicy._invokeex_( self, dispid, lcid, wFlags, args, kwArgs, serviceProvider) File "C:\Python22\lib\site-packages\win32com\server\policy.py", line 510, in _invokeex_ return apply(func, args) File "C:\Python22\spam\spambayes\Outlook2000\addin.py", line 184, in OnItemAdd msgstore_message = self.manager.message_store.GetMessage(item) File "C:\Python22\spam\spambayes\Outlook2000\msgstore.py", line 230, in GetMessage message_id = self.NormalizeID(message_id) File "C:\Python22\spam\spambayes\Outlook2000\msgstore.py", line 178, in NormalizeID assert type(item_id) in [type(''), type(u'')], "What kind of ID is '%r'?" % (item_id,) exceptions.AssertionError: What kind of ID is ''? Piers. From anthony at interlink.com.au Tue Jan 21 10:33:46 2003 From: anthony at interlink.com.au (Anthony Baxter) Date: Mon Jan 20 18:36:55 2003 Subject: [Spambayes] locking pickle/dbm against concurrent access? In-Reply-To: <15916.3865.297629.696625@montanaro.dyndns.org> Message-ID: <200301202333.h0KNXl520936@localhost.localdomain> >>> Skip Montanaro wrote > > Depending on how training and classifying are accomplished, it's quite > possible that the two activities will be done in different processes. For > example, I am currently experimenting with training using pop3proxy (well, > still my offshoot proxytrainer at the moment) while classification is being > done by hammiefilter run from procmail. This implies a need to lock the > shelve/pickle file used to store the training info. Seems to me we need to > (be able to) lock the shelve/pickle file. The only lock facility which > seems cross-platform enough for this application is the set of flags used by > os.open(). To lock the database you'd have to check/create a lock file > related (namewise) to the actual database file. Has anyone given this any > thought? I'd suggest, instead, that training write to a different filename, then, when it's complete, rename the new file to the existing file. Real operating systems will do the right thing - I don't know if Windows will just choke and die, tho.... -- Anthony Baxter It's never too late to have a happy childhood. From anthony at interlink.com.au Tue Jan 21 10:38:43 2003 From: anthony at interlink.com.au (Anthony Baxter) Date: Mon Jan 20 18:40:48 2003 Subject: [Spambayes] locking pickle/dbm against concurrent access? In-Reply-To: <11mo2vsri5vjvio62irbkq3ihcjndb0p9k@4ax.com> Message-ID: <200301202338.h0KNchK21003@localhost.localdomain> >>> Richie Hindle wrote > For what it's worth, this is one of the reasons I'm keen to keep all server > components within one process, and using asyncore - all concurrency issues > are taken care of automatically. It's probably overkill for our > application, but if hammie could classify by talking to the web UI, just > like your proxytee.py script does, we could use the server as the > concurrency mechanism. Pure hammie users wouldn't need a server (probably, > depending on how they train). This is how most relational databases do it, > after all... Hm. I'd prefer that this _not_ be a requirement, as it makes it harder for my setup to do the right thing, as well as limiting the usefulness in a number of potential applications I'm thinking of for work... Anthony -- Anthony Baxter It's never too late to have a happy childhood. From anthony at interlink.com.au Tue Jan 21 10:40:40 2003 From: anthony at interlink.com.au (Anthony Baxter) Date: Mon Jan 20 18:42:20 2003 Subject: [Spambayes] spambayes with gnus/nnml In-Reply-To: Message-ID: <200301202340.h0KNefa21028@localhost.localdomain> >>> John Hunter wrote > What I would like is for spam bayes to prefilter my mail, siphoning > off spam to a spam folder, and putting the rest in > /var/spool/mail/jdhunter (or some other file that I can advise gnus to > check) where I can then use gnus to split it. But I am not sure which > dirs I should advise hammiefilter.py to monitor so it can retrain > itself. Are any of you combining spambayes with gnus splitting? The > notes in HAMMIE.txt suggest that hammiefilter expects mbox files. Can you use procmail? If so, check the INTEGRATION.txt file for a suitable recipe - modify it so the default action is to put the message in your spool file. Anthony -- Anthony Baxter It's never too late to have a happy childhood. From neale at woozle.org Mon Jan 20 15:58:10 2003 From: neale at woozle.org (Neale Pickett) Date: Mon Jan 20 18:58:16 2003 Subject: [Spambayes] spambayes with gnus/nnml In-Reply-To: (John Hunter's message of "Mon, 20 Jan 2003 17:19:29 -0600") References: Message-ID: John Hunter writes: > What I would like is for spam bayes to prefilter my mail, siphoning > off spam to a spam folder, and putting the rest in > /var/spool/mail/jdhunter (or some other file that I can advise gnus to > check) where I can then use gnus to split it. But I am not sure which > dirs I should advise hammiefilter.py to monitor so it can retrain > itself. Are any of you combining spambayes with gnus splitting? The > notes in HAMMIE.txt suggest that hammiefilter expects mbox files. Well, I think mboxtrain will see a Gnus nnml directory as an MH directory and work fine. But I think there may be a better solution for Gnus, and maybe in turn, mutt. I'll check out Ted Zlatanov's spam.el to see how easy it'd be to hook in. IIRC hooking spambayes into this would be a piece of cake. Neale From neale at woozle.org Mon Jan 20 16:00:13 2003 From: neale at woozle.org (Neale Pickett) Date: Mon Jan 20 19:00:17 2003 Subject: [Spambayes] locking pickle/dbm against concurrent access? In-Reply-To: <200301202333.h0KNXl520936@localhost.localdomain> (Anthony Baxter's message of "Tue, 21 Jan 2003 10:33:46 +1100") References: <200301202333.h0KNXl520936@localhost.localdomain> Message-ID: Hey guys, I'm not quite up on the discussion yet, but doesn't the bsddb module already lock the database when you open it for writes? I really don't recall, because I remember writing a wrapper for dbm files once that locked the database using flock(), but then I remember getting rid of that because I thought dbm files were automatically locked by the dbm implementation. Neale From neale at woozle.org Mon Jan 20 16:18:21 2003 From: neale at woozle.org (Neale Pickett) Date: Mon Jan 20 19:18:29 2003 Subject: [Spambayes] Something's still missing from hammiefilter In-Reply-To: <15909.56641.568386.266344@montanaro.dyndns.org> (Skip Montanaro's message of "Wed, 15 Jan 2003 16:14:25 -0600") References: <15909.56641.568386.266344@montanaro.dyndns.org> Message-ID: Skip Montanaro writes: > The -d (use dbm) and -p (specify pickle or database file) flags are missing. > I'd really prefer these be available on the command line as well as via the > options file. Is there a reason not to expose them on the command line? Nope. They're exposed now. Thanks for the suggestion :) Neale From skip at pobox.com Mon Jan 20 18:26:03 2003 From: skip at pobox.com (Skip Montanaro) Date: Mon Jan 20 19:26:16 2003 Subject: [Spambayes] locking pickle/dbm against concurrent access? In-Reply-To: <200301202333.h0KNXl520936@localhost.localdomain> References: <15916.3865.297629.696625@montanaro.dyndns.org> <200301202333.h0KNXl520936@localhost.localdomain> Message-ID: <15916.37787.511871.538898@montanaro.dyndns.org> Anthony> I'd suggest, instead, that training write to a different Anthony> filename, then, when it's complete, rename the new file to the Anthony> existing file. Depending how your training works you might wind up copying a 20+MB file for each message. Skip From skip at pobox.com Mon Jan 20 18:28:14 2003 From: skip at pobox.com (Skip Montanaro) Date: Mon Jan 20 19:28:27 2003 Subject: [Spambayes] spambayes with gnus/nnml In-Reply-To: <200301202340.h0KNefa21028@localhost.localdomain> References: <200301202340.h0KNefa21028@localhost.localdomain> Message-ID: <15916.37918.391558.623588@montanaro.dyndns.org> >>>> John Hunter wrote >> What I would like is for spam bayes to prefilter my mail, siphoning >> off spam to a spam folder, and putting the rest in >> /var/spool/mail/jdhunter ... Anthony> Can you use procmail? Agreed, this should not be spambayes' job. SpamAssassin went through this a few months ago. SA was originally written so it could do what John requested. They finally concluded this was wrong and wound up ripping out the stuff which did it. Skip From anthony at interlink.com.au Tue Jan 21 11:28:26 2003 From: anthony at interlink.com.au (Anthony Baxter) Date: Mon Jan 20 19:30:36 2003 Subject: [Spambayes] locking pickle/dbm against concurrent access? In-Reply-To: <15916.37787.511871.538898@montanaro.dyndns.org> Message-ID: <200301210028.h0L0SSP21497@localhost.localdomain> >>> Skip Montanaro wrote > Depending how your training works you might wind up copying a 20+MB file for > each message. Copying? When? You'd write out the new pickles, sure, but you have to do that, anyway. From skip at pobox.com Mon Jan 20 18:34:52 2003 From: skip at pobox.com (Skip Montanaro) Date: Mon Jan 20 19:35:00 2003 Subject: [Spambayes] locking pickle/dbm against concurrent access? In-Reply-To: <200301210028.h0L0SSP21497@localhost.localdomain> References: <15916.37787.511871.538898@montanaro.dyndns.org> <200301210028.h0L0SSP21497@localhost.localdomain> Message-ID: <15916.38316.889660.193790@montanaro.dyndns.org> >>>> Skip Montanaro wrote >> Depending how your training works you might wind up copying a 20+MB >> file for each message. Anthony> Copying? When? You'd write out the new pickles, sure, but you Anthony> have to do that, anyway. How do you get the temp file from the real file without copying it? If I understand the way things work, you'd do something like * copy real to temp * train on new messages * update temp * move temp back to real (the atomic part we all want) Skip From skip at pobox.com Mon Jan 20 18:32:13 2003 From: skip at pobox.com (Skip Montanaro) Date: Mon Jan 20 19:35:57 2003 Subject: [Spambayes] locking pickle/dbm against concurrent access? In-Reply-To: References: <200301202333.h0KNXl520936@localhost.localdomain> Message-ID: <15916.38157.157203.892109@montanaro.dyndns.org> Neale> Hey guys, I'm not quite up on the discussion yet, but doesn't the Neale> bsddb module already lock the database when you open it for Neale> writes? If it does, that would be fine when anydbm selects that database, but wouldn't help people who (silently) get other databases. Also, I believe the shelve module always opens databases for read/write access, probably generating unnecessary lock activity. It would be nice if hammiefilter (at least) could open the file read-only. Skip From anthony at interlink.com.au Tue Jan 21 11:52:05 2003 From: anthony at interlink.com.au (Anthony Baxter) Date: Mon Jan 20 19:54:17 2003 Subject: [Spambayes] locking pickle/dbm against concurrent access? In-Reply-To: <15916.38316.889660.193790@montanaro.dyndns.org> Message-ID: <200301210052.h0L0q8R21738@localhost.localdomain> >>> Skip Montanaro wrote > How do you get the temp file from the real file without copying it? If I > understand the way things work, you'd do something like > > * copy real to temp > * train on new messages > * update temp > * move temp back to real (the atomic part we all want) I thought it'd be more like: * open real in read-only mode, load into memory * train on new messages * write new data out to temp * rename temp to real (atomically) -- Anthony Baxter It's never too late to have a happy childhood. From skip at pobox.com Mon Jan 20 19:18:49 2003 From: skip at pobox.com (Skip Montanaro) Date: Mon Jan 20 20:18:58 2003 Subject: [Spambayes] locking pickle/dbm against concurrent access? In-Reply-To: <200301210052.h0L0q8R21738@localhost.localdomain> References: <15916.38316.889660.193790@montanaro.dyndns.org> <200301210052.h0L0q8R21738@localhost.localdomain> Message-ID: <15916.40953.882684.507152@montanaro.dyndns.org> >>>>> "Anthony" == Anthony Baxter writes: >>>> Skip Montanaro wrote >> How do you get the temp file from the real file without copying it? If I >> understand the way things work, you'd do something like >> >> * copy real to temp >> * train on new messages >> * update temp >> * move temp back to real (the atomic part we all want) Anthony> I thought it'd be more like: Anthony> * open real in read-only mode, load into memory Anthony> * train on new messages Anthony> * write new data out to temp Anthony> * rename temp to real (atomically) Perhaps, but that first step would be even more expensive than a simple copy. I thought all the current system did was score the current message then update only those keys necessary. In addition, I don't think shelve allows you to open a database in read-only mode. Oops, wait, the default is read/write. Neither shelve.open()'s docstring nor the section in the libref manual says anything about its flag argument. You have to RTSL to learn about it. I'll see about fixing that. The libref docs do say: The shelve module does not support concurrent read/write access to shelved objects. (Multiple simultaneous read accesses are safe.) When a program has a shelf open for writing, no other program should have it open for reading or writing. Unix file locking can be used to solve this, but this differs across Unix versions and requires knowledge about the database implementation used. Skip From tim at fourstonesExpressions.com Mon Jan 20 20:17:32 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Mon Jan 20 21:18:42 2003 Subject: [Spambayes] spampot -- spam honeypot server In-Reply-To: <15916.27890.874021.624060@montanaro.dyndns.org> Message-ID: Skip, a probe message is a test smtp stream that's sent to an ip address by someone who's looking to see if an smtp server is running there. If it appears that there is, the spammer will try to relay a spam through it. If that works, then watch out... the floodgates will open. Neale's spampot idea is very kewl! - TimS 1/20/2003 3:41:06 PM, Skip Montanaro wrote: >Neale, > >Hopefully I won't sound too much like an idiot, but what's a "probe >message"? How do you classify messages which come into spampot, just "probe >message" and "everything else"? > >Skip > > >_______________________________________________ >Spambayes mailing list >Spambayes@python.org >http://mail.python.org/mailman/listinfo/spambayes > > c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org From tim at fourstonesExpressions.com Mon Jan 20 21:28:52 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Mon Jan 20 22:29:53 2003 Subject: [Spambayes] spampot -- spam honeypot server In-Reply-To: Message-ID: Probe detection.... looks like a job for spambayes... - TimS ;) 1/20/2003 5:16:00 PM, "Neale Pickett" wrote: >Skip Montanaro writes: > >> Neale, >> >> Hopefully I won't sound too much like an idiot, but what's a "probe >> message"? How do you classify messages which come into spampot, just >> "probe message" and "everything else"? > >So when you kick up a mail server, you'll get a lot of messages like >this: > > SMTP-Hello: master-cv7889w2 > SMTP-Mail-From: > SMTP-Rcpt-To: > From: china9988@21cn.com > Subject: 192.168.1.2 > To: china9988@21cn.com > Date: Thu, 16 Jan 2003 21:48:41 +0900 > X-Priority: 3 > X-Library: Indy 8.0.25 > > t_Smtp.LocalIP > >This is one of the more baffling probes, since china9988@21cn.com gives >NDRs--maybe really old spam software. But all of the probes I've seen >so far have the IP address of my honeypot sever in the subject line. It >makes sense--send out mail blindly, and anything you get back has the IP >address of an open relay in the subject line. > >And yes, currently I only classify as "probe" and "everything else". I >do this with Maildir flags, though there's really no reason why it >should have to be in Maildir format, aside from making it easy to view >with mutt. > >Right now my probe detection logic needs work :) > >Neale > >_______________________________________________ >Spambayes mailing list >Spambayes@python.org >http://mail.python.org/mailman/listinfo/spambayes > > c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org From tim.one at comcast.net Mon Jan 20 23:50:41 2003 From: tim.one at comcast.net (Tim Peters) Date: Mon Jan 20 23:51:14 2003 Subject: [Spambayes] FYI: Java implementation In-Reply-To: <3E2B9962.26334.308D0BD@localhost> Message-ID: [Richard Jowsey] > I have a very large training corpus, so I'm seeing well- > separated distributions of good versus spam probs, with a > sprinkling of "unsures" scattered through the middle. An > uncertain cutoff at 3 sigma from the means should work, but this > notion needs some testing. That chi2 test is definitely on the > drawing boards, even if only for comparison purposes... Anthony Baxter has some plots of score distributions for Graham-combining, Gary-combining and chi-combining here: http://spambayes.sourceforge.net/background.html It's the sharpness and spread of the separation in chi- that's attractive. Our experiments showed (most of mine were on a 34,000-msg database) that you could usually pick cutoffs equally good under Gary-combining, but that it took 3 decimal digits of precision to do so, best cutoffs kept shifting over time (== amount of training data) and across test sets, and that it wasn't possible to guess good values in advance. In contrast, canned chi- cutoff values with 1 decimal digit of precision worked well for just about everyone. The primary size-related (# of training msgs) effect I noticed is that the chi- unsure range could be profitably shrunk the more msgs trained on, but even if you didn't bother, your original cutoffs continued to work well (although, as with Gary-combining, *optimal* cutoffs shifted too; chi- degraded more gently if you didn't bother to change them). From anthony at interlink.com.au Tue Jan 21 16:35:59 2003 From: anthony at interlink.com.au (Anthony Baxter) Date: Tue Jan 21 00:38:26 2003 Subject: [Spambayes] FYI: Java implementation In-Reply-To: Message-ID: <200301210535.h0L5Zxc23853@localhost.localdomain> >>> Tim Peters wrote > It's the sharpness and spread of the separation in chi- that's attractive. > Our experiments showed (most of mine were on a 34,000-msg database) that you > could usually pick cutoffs equally good under Gary-combining, but that it > took 3 decimal digits of precision to do so, best cutoffs kept shifting over > time (== amount of training data) and across test sets, and that it wasn't > possible to guess good values in advance. It's also worth noting that the optimal cutoff values before chi-combining varied between 0.5 something and 0.7 for some people. It was impossible to pick a number that worked for everyone. (yes, I do plan to re-do the plots off the same data set at some point, and add some for the CLM combiners... - if someone wants to do it first and save me the effort, it would be faaaabulous) Anthony From neale at woozle.org Tue Jan 21 00:09:49 2003 From: neale at woozle.org (Neale Pickett) Date: Tue Jan 21 03:09:57 2003 Subject: [Spambayes] Something's still missing from hammiefilter In-Reply-To: (Neale Pickett's message of "Mon, 20 Jan 2003 16:18:21 -0800") References: <15909.56641.568386.266344@montanaro.dyndns.org> Message-ID: Neale Pickett writes: > They're exposed now. Don't use the new options to hammiefilter.py just yet--they won't work. I've got something with working options, as well as a new -t option to filter and train in one step. I'll check it in tomorrow after it's handled a night of email and I'm sure it does what it says it does :) Neale From mwh at python.net Tue Jan 21 10:28:55 2003 From: mwh at python.net (Michael Hudson) Date: Tue Jan 21 05:29:04 2003 Subject: [Spambayes] Re: FYI: Java implementation References: <3E2B9962.26334.308D0BD@localhost> Message-ID: <2mvg0issns.fsf@starship.python.net> Tim Peters writes: > [Richard Jowsey] > > I have a very large training corpus, so I'm seeing well- > > separated distributions of good versus spam probs, with a > > sprinkling of "unsures" scattered through the middle. An > > uncertain cutoff at 3 sigma from the means should work, but this > > notion needs some testing. That chi2 test is definitely on the > > drawing boards, even if only for comparison purposes... > > Anthony Baxter has some plots of score distributions for Graham-combining, > Gary-combining and chi-combining here: > > http://spambayes.sourceforge.net/background.html I meant to say it when I first looked at that page, but seeing those plots nearly made my eyeballs fall out. Why does anyone still use Graham-combining? Cheers, M. From msergeant at startechgroup.co.uk Tue Jan 21 10:46:31 2003 From: msergeant at startechgroup.co.uk (Matt Sergeant) Date: Tue Jan 21 05:46:32 2003 Subject: [Spambayes] Re: [SAtalk] spampot -- spam honeypot server (fwd) In-Reply-To: <20030120222240.CE21D16F16@jmason.org> Message-ID: <9A0FE27E-2D2D-11D7-AE99-0003939CB5D8@startechgroup.co.uk> On Monday, Jan 20, 2003, at 22:22 Europe/London, Justin Mason wrote: > From: "Neale Pickett" > > The first night I ran spampot on an IP with no DNS entry associated > with > it, I got a probe after four hours. After I'd fixed the probe relaying > logic to relay that type of probe, it took all of ten hours for me to > collect over 400MB of spam. I had dinner with Neale where we discussed this. Interesting project (as is Jackpot, but err, Java. Ick). However the one downside of this was it was 400MB of the EXACT same email. :-) My guess is you'd need to put some sort of Razor-like signature checking in place (perhaps using Pyzor) to remove dupes. Matt. From jm at jmason.org Tue Jan 21 11:20:53 2003 From: jm at jmason.org (Justin Mason) Date: Tue Jan 21 06:20:54 2003 Subject: [Spambayes] Re: [SAtalk] spampot -- spam honeypot server (fwd) In-Reply-To: Message from Matt Sergeant <9A0FE27E-2D2D-11D7-AE99-0003939CB5D8@startechgroup.co.uk> Message-ID: <20030121112058.D815116F16@jmason.org> Matt Sergeant said: > My guess is you'd need to put some sort of Razor-like signature > checking in place (perhaps using Pyzor) to remove dupes. Actually, I have some rough-but-working-well-enough perl code in SpamAssassin CVS, in the "masses/corpora" dir, which does this. "fuzzy-hash-maildir" is the script in question. Here's how it works: - for each mail: - strip all HTML tags - strip text in "quotes" -- vars in javascript, etc. - remove words with ? marks inside them, possible encoded mail addrs - remove words with @ marks inside them, possible encoded mail addrs - remove lines that contain just a single string of non-white chars, possible hash busters or encoded mail addrs - split into an array of lines (NOT bytes, since spammers are using variable-length hash-busting strings) - divide into 4 blocks and hash them: hash1, hash2, hash3, hash4 - output into associative arrays as hash1.hash2 -> filename hash1.hash2.hash3 -> filename hash1.hash2.hash3.hash4 -> filename (should probably use e.g. hash2.hash3.hash4 as well. Note that hashbusters and encoded addrs generally appear in the first and/or last blocks.) - finally check those arrays for collisions and output these as "likely dups". It works sufficiently well. ;) --j. From pje at telecommunity.com Tue Jan 21 10:11:12 2003 From: pje at telecommunity.com (Phillip J. Eby) Date: Tue Jan 21 10:11:48 2003 Subject: [Spambayes] Promoting Spambayes (was Re: FYI: Java implementation) In-Reply-To: <2mvg0issns.fsf@starship.python.net> References: <3E2B9962.26334.308D0BD@localhost> Message-ID: <5.1.0.14.0.20030121100703.01ea7dd0@mail.telecommunity.com> At 10:28 AM 1/21/03 +0000, Michael Hudson wrote: >Tim Peters writes: > > Anthony Baxter has some plots of score distributions for Graham-combining, > > Gary-combining and chi-combining here: > > > > http://spambayes.sourceforge.net/background.html > >I meant to say it when I first looked at that page, but seeing those >plots nearly made my eyeballs fall out. Why does anyone still use >Graham-combining? Because nobody's seen the plots, obviously. :) I think what would be needed to change that would be: 1. A Spambayes release 2. A "spam shootout" wherein half a dozen Bayesian mail filters (e.g. Popfile, Mozilla, other...?) are tested against the same corpus, using the cross-validation testing mechanism. 3a. Spambayes comes out on top, with a fraction of the error rate of others: publish the results, get a Slashdot story, and slashdot the project site. :) 3b. Spambayes doesn't come out on top: find out why, fix the problem, go to step 3a. :) From msergeant at startechgroup.co.uk Tue Jan 21 16:05:48 2003 From: msergeant at startechgroup.co.uk (Matt Sergeant) Date: Tue Jan 21 11:07:09 2003 Subject: [Spambayes] Promoting Spambayes (was Re: FYI: Java implementation) In-Reply-To: <5.1.0.14.0.20030121100703.01ea7dd0@mail.telecommunity.com> Message-ID: <34AD58D0-2D5A-11D7-AE99-0003939CB5D8@startechgroup.co.uk> On Tuesday, Jan 21, 2003, at 15:11 Europe/London, Phillip J. Eby wrote: > 2. A "spam shootout" wherein half a dozen Bayesian mail filters (e.g. > Popfile, Mozilla, other...?) are tested against the same corpus, > using the cross-validation testing mechanism. > > 3a. Spambayes comes out on top, with a fraction of the error rate of > others: publish the results, get a Slashdot story, and slashdot the > project site. :) > > 3b. Spambayes doesn't come out on top: find out why, fix the problem, > go to step 3a. :) Mozilla and SpamAssassin both copy their bayesian code from spambayes (including tokenisation ideas and combiners). If the results are different at all that's probably a bug somewhere. It's nice to be proud of software, but when it's open source you kinda leave it wide open for us to nab your ideas ;-) Matt. From tim.one at comcast.net Tue Jan 21 11:12:00 2003 From: tim.one at comcast.net (Tim Peters) Date: Tue Jan 21 11:13:05 2003 Subject: [Spambayes] FYI: Java implementation In-Reply-To: <200301210535.h0L5Zxc23853@localhost.localdomain> Message-ID: [Anthony Baxter] > It's also worth noting that the optimal cutoff values before chi-combining > varied between 0.5 something and 0.7 for some people. It was impossible to > pick a number that worked for everyone. Ah, memories . > (yes, I do plan to re-do the plots off the same data set at some point, > and add some for the CLM combiners... - if someone wants to do it first > and save me the effort, it would be faaaabulous) Assuming CLM refers to the three central-limit combining schemes, they never got far enough to develop a rational notion of "score". They were the first schemes that "knew when they were confused", and that caught us by surprise: the initial stabs at getting "a score" out of them were like Graham-combining in that they were sometimes extremely certain of a wrong answer. It took a while to realize that, when this happened, an internal (for example) spam score was 50 sdevs on the spam of the ham mean, simultaneous with the internal ham score being 40 sdevs on the ham side of the spam mean. The overall result was extreme certainty that the thing was spam, although the internal scores were certain it was neither. Once we figured that out, testing proceeded by producing one of exactly three scores: "it's ham", "it's spam", "I'm lost". That's as far as they got, at which point chi-combining appeared, also knowing when it was lost, but far less problematic for training, and producing a "smooth" score naturally. A CLM plot would consist of three vertical lines, and so be a bit confusing . From neale at woozle.org Tue Jan 21 08:15:17 2003 From: neale at woozle.org (Neale Pickett) Date: Tue Jan 21 11:15:24 2003 Subject: [Spambayes] Re: [SAtalk] spampot -- spam honeypot server (fwd) In-Reply-To: <9A0FE27E-2D2D-11D7-AE99-0003939CB5D8@startechgroup.co.uk> (Matt Sergeant's message of "Tue, 21 Jan 2003 10:46:31 +0000") References: <9A0FE27E-2D2D-11D7-AE99-0003939CB5D8@startechgroup.co.uk> Message-ID: Matt Sergeant writes: > I had dinner with Neale where we discussed this. Interesting project > (as is Jackpot, but err, Java. Ick). However the one downside of this > was it was 400MB of the EXACT same email. Psh, details. :^) But yes, it was 200MB of the exact same email (with templates filled in), and then 200MB of another message. I've since added in logic to only store every 20th message sent over the same SMTP connection. Now I have a measley 2MB of spam. But I imagine that will increase rapidly as my probe detection gets better. (I mean, I've only been running this thing for a week ;) I'm not sure if I need razor (or pyzor) just yet. So far, spammers will send all their mail over just a few connections, so it's effective to only store specimens. But I may need some signature-checking logic soon. Neale From anthony at interlink.com.au Wed Jan 22 03:18:36 2003 From: anthony at interlink.com.au (Anthony Baxter) Date: Tue Jan 21 11:20:18 2003 Subject: [Spambayes] FYI: Java implementation In-Reply-To: Message-ID: <200301211618.h0LGIai30812@localhost.localdomain> >>> Tim Peters wrote > A CLM plot would consist of three vertical lines, and so be a bit confusing > . Yes, but suggesting them _did_ get a nice simple summary about them out of you, so it wasn't a complete loss :) Anthony -- Anthony Baxter It's never too late to have a happy childhood. From nas at python.ca Tue Jan 21 08:31:48 2003 From: nas at python.ca (Neil Schemenauer) Date: Tue Jan 21 11:25:44 2003 Subject: [Spambayes] Promoting Spambayes (was Re: FYI: Java implementation) In-Reply-To: <34AD58D0-2D5A-11D7-AE99-0003939CB5D8@startechgroup.co.uk> References: <5.1.0.14.0.20030121100703.01ea7dd0@mail.telecommunity.com> <34AD58D0-2D5A-11D7-AE99-0003939CB5D8@startechgroup.co.uk> Message-ID: <20030121163148.GA15240@glacier.arctrix.com> Matt Sergeant wrote: > Mozilla and SpamAssassin both copy their bayesian code from spambayes > (including tokenisation ideas and combiners). I, for one, am extremely pleased to hear that. It would be a shame if people kept using Paul Graham's original algorithm after all the work that was put in improving Spambayes. Despite what was said at the spam conference, I think the algorithm is important. > It's nice to be proud of software, but when it's open source you kinda > leave it wide open for us to nab your ideas ;-) I think the concern was that people won't nab the ideas (because they didn't know about them). Neil From pje at telecommunity.com Tue Jan 21 11:28:10 2003 From: pje at telecommunity.com (Phillip J. Eby) Date: Tue Jan 21 11:28:55 2003 Subject: [Spambayes] Promoting Spambayes (was Re: FYI: Java implementation) In-Reply-To: <34AD58D0-2D5A-11D7-AE99-0003939CB5D8@startechgroup.co.uk> References: <5.1.0.14.0.20030121100703.01ea7dd0@mail.telecommunity.com> Message-ID: <5.1.0.14.0.20030121112431.01ea3390@mail.telecommunity.com> At 04:05 PM 1/21/03 +0000, Matt Sergeant wrote: >Mozilla and SpamAssassin both copy their bayesian code from spambayes >(including tokenisation ideas and combiners) Cool. But then it sounds like the "heavy hitters" have already abandoned the Graham algorithm. >It's nice to be proud of software, but when it's open source you kinda >leave it wide open for us to nab your ideas ;-) That's the whole point of promoting Spambayes, really. To get the "good stuff" into the hands of more people. I'd rather see lots of programs using the more effective ideas, than have a bunch of non-programmers try less effective tools and swear off of "learning" spam filters because of it. From mwh at python.net Tue Jan 21 16:56:17 2003 From: mwh at python.net (Michael Hudson) Date: Tue Jan 21 11:56:26 2003 Subject: [Spambayes] Re: Promoting Spambayes (was Re: FYI: Java implementation) References: <5.1.0.14.0.20030121100703.01ea7dd0@mail.telecommunity.com> <34AD58D0-2D5A-11D7-AE99-0003939CB5D8@startechgroup.co.uk> Message-ID: <2msmvmmoge.fsf@starship.python.net> Matt Sergeant writes: > Mozilla and SpamAssassin both copy their bayesian code from spambayes > (including tokenisation ideas and combiners). If the results are > different at all that's probably a bug somewhere. Really? I knew SA did, but I hadn't heard anything about Mozilla. I'm trying to find out one way or the other through bugzilla, but it scares me :) I did find an open bug saying "you should use spambayes' algorithm". > It's nice to be proud of software, but when it's open source you kinda > leave it wide open for us to nab your ideas ;-) Kinda the point, I'd say :-) Then you get to deal with the nasty integration issues... Cheers, M. -- Gevalia is undrinkable low-octane see-through only slightly roasted bilge water. Compared to .us coffee it is quite drinkable. -- Måns Nilsson, asr From tim.one at comcast.net Tue Jan 21 12:03:20 2003 From: tim.one at comcast.net (Tim Peters) Date: Tue Jan 21 12:04:29 2003 Subject: [Spambayes] Re: FYI: Java implementation In-Reply-To: <2mvg0issns.fsf@starship.python.net> Message-ID: [Michael Hudson, on the plots at http://spambayes.sourceforge.net/background.html ] > I meant to say it when I first looked at that page, but seeing those > plots nearly made my eyeballs fall out. Why does anyone still use > Graham-combining? Perhaps because the "Plan for Spam" paper kept on describing it, and people who tried it found that their first stab worked better than anything else they had tried. It took much testing on large and varied data before its problems became clear. Paul Graham has since discovered some of these on his own, as he started getting his own false positives: http://www.paulgraham.com/better.html Graham-combining has the advantage of being rigorously correct, to the extent that its assumptions hold (word independence, and prior spam probability of 0.5). I can't really say what chi-combining produces in the end, other than that "it's a score". It's certainly not the probability that a msg is spam. Graham-combining does compute a spam probability, which would be correct if only the world were nothing like it is . So it's explainable and works remarkably well out of the box. Its problems are more-or-less subtle, and people have little patience for subtleties. From pje at telecommunity.com Tue Jan 21 12:17:59 2003 From: pje at telecommunity.com (Phillip J. Eby) Date: Tue Jan 21 12:18:29 2003 Subject: [Spambayes] Re: FYI: Java implementation In-Reply-To: References: <2mvg0issns.fsf@starship.python.net> Message-ID: <5.1.0.14.0.20030121121650.03b04e10@mail.telecommunity.com> At 12:03 PM 1/21/03 -0500, Tim Peters wrote: >So it's explainable and works remarkably well out of the box. Its problems >are more-or-less subtle, and people have little patience for subtleties. Then let's hear it for the 'bots, who not only have patience for subtleties, but obsess on them as well! :) From jm at jmason.org Tue Jan 21 17:11:52 2003 From: jm at jmason.org (Justin Mason) Date: Tue Jan 21 12:33:23 2003 Subject: [Spambayes] Promoting Spambayes (was Re: FYI: Java implementation) In-Reply-To: Message from Neil Schemenauer <20030121163148.GA15240@glacier.arctrix.com> Message-ID: <20030121171157.67BCD16F16@jmason.org> Neil Schemenauer said: > Matt Sergeant wrote: > > Mozilla and SpamAssassin both copy their bayesian code from spambayes > > (including tokenisation ideas and combiners). > > I, for one, am extremely pleased to hear that. It would be a shame if > people kept using Paul Graham's original algorithm after all the work > that was put in improving Spambayes. Despite what was said at the spam > conference, I think the algorithm is important. BTW it's worth noting we didn't just "nab" the ideas ;) Instead I reimplemented based on descriptions, running a cross-validation test each time, and threw in a few tokenization ideas of our own. In most cases the results indicated that SpamBayes' techniques are the most effective -- there were a few extras, like SpamAssassin tokenizing some headers that SB doesn't (From etc.), and different S and X values, but for the most part they're effectively the same. The nice thing is that it means those techniques have been independently verified by 2 parties -- in other words, a scientific process ;) --j. From noreply at sourceforge.net Tue Jan 21 10:31:48 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Tue Jan 21 13:34:07 2003 Subject: [Spambayes] [ spambayes-Patches-670417 ] Allow the pop3 proxies to bind to specific addresses Message-ID: Patches item #670417, was opened at 2003-01-18 20:06 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498105&aid=670417&group_id=61702 Category: None Group: None >Status: Closed >Resolution: Accepted Priority: 5 Submitted By: Tony Lownds (tonylownds) Assigned to: Richie Hindle (richiehindle) Summary: Allow the pop3 proxies to bind to specific addresses Initial Comment: This patch allows one to specify an IP address when specifying a port in the pop3proxy_ports setting. This is useful for two reasons: 1. By binding to a loopback address, the pop3proxy cannot be contacted from outside machines. Providing this option improves security. 2. The mail client Eudora - which is quite popular - is unable to specify a different POP port for different POP accounts. This patch alllows Eudora to be used with spambayes with multiple POP accounts. The implementation is fairly straightforward: any place a port was passed for binding, a pair of (address, port) is passed. In the two places a port was read (from a configuration file and from command line options), either an int or an address:int is accepted. Any place a port was turned into a string for printing, the (address, port) pair is turned into a suitable string. ---------------------------------------------------------------------- >Comment By: Richie Hindle (richiehindle) Date: 2003-01-21 18:31 Message: Logged In: YES user_id=85414 Many thanks for the patch, Tony - excellent job. Checked in as pop3proxy.py 1.38 and spambayes/Dibbler.py 1.2. ---------------------------------------------------------------------- Comment By: Richie Hindle (richiehindle) Date: 2003-01-20 14:28 Message: Logged In: YES user_id=85414 If you can't upload them here, please email them to me. Thanks. ---------------------------------------------------------------------- Comment By: François Granger (fgranger) Date: 2003-01-20 13:13 Message: Logged In: YES user_id=86948 I asked Tony about this, he sent me the files. Can I upload them or forward them to you ? ---------------------------------------------------------------------- Comment By: Richie Hindle (richiehindle) Date: 2003-01-20 11:35 Message: Logged In: YES user_id=85414 Has SourceForge eaten the patch file? It says "No Files Currently Attached". ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498105&aid=670417&group_id=61702 From francois.granger at free.fr Tue Jan 21 19:39:44 2003 From: francois.granger at free.fr (=?iso-8859-1?Q?Fran=E7ois?= Granger) Date: Tue Jan 21 13:40:25 2003 Subject: [Spambayes] Issue with pop3proxy Message-ID: I got this message in my mailbox. The strange thing is that the X-Spambayes-Classification: spam got added after the content, this is not normal. It happens on some messages from more than one sender but not all messages from the same sender on this mailing list. A quick comparison between the headers of two messages from same sender, one with proper X-Spambayes-Classification: in header and the other with the field at end of message show no easy difference. ================================================== Return-Path: Received: from cancale.medicalistes.org (62.212.100.79) by smtp.laposte.net (6.0.053) id 3DF927E500486B4A; Tue, 21 Jan 2003 09:33:35 +0100 Received: from cancale.medicalistes.org (localhost [127.0.0.1]) by cancale.medicalistes.org (8.12.6/8.12.6/Debian-6Woody) with ESMTP id h0L8XJOc031969; Tue, 21 Jan 2003 09:33:19 +0100 Received: (from sympa@localhost) by cancale.medicalistes.org (8.12.6/8.12.6/Debian-6Woody) id h0L8XJQe031967; Tue, 21 Jan 2003 09:33:19 +0100 X-Authentication-Warning: cancale.medicalistes.org: sympa set sender to alois-owner@medicalistes.org using -f Received: from mel-rto4.wanadoo.fr (smtp-out-4.wanadoo.fr [193.252.19.23]) by cancale.medicalistes.org (8.12.6/8.12.6/Debian-6Woody) with ESMTP id h0L8XGOc031964 for ; Tue, 21 Jan 2003 09:33:16 +0100 Received: from mel-rta6.wanadoo.fr (193.252.19.26) by mel-rto4.wanadoo.fr (6.7.015) id 3E0C33FD00EBE0FE for alois@medicalistes.org; Tue, 21 Jan 2003 09:33:16 +0100 Received: from pc (193.250.146.197) by mel-rta6.wanadoo.fr (6.7.015) id 3E26CE21002ADE4B for alois@medicalistes.org; Tue, 21 Jan 2003 09:33:15 +0100 Message-ID: <003e01c2c128$485b70c0$c592fac1@pc> From: "Martine Lemaitre" To: References: <000901c2c117$26334ce0$03b5a8c0@duron800> Subject: Re: [Alois] P?ter les plombs Date: Tue, 21 Jan 2003 09:26:42 +0100 MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_NextPart_000_0029_01C2C12F.35460480" X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 5.00.2314.1300 X-MimeOLE: Produced By Microsoft MimeOLE V5.00.2314.1300 Reply-To: alois@medicalistes.org X-Loop: alois@medicalistes.org X-Sequence: 1253 Precedence: list X-no-archive: yes List-Id:
[HTML deleted...]
X-Spambayes-Classification: spam
================================================== -- Recently using MacOSX....... From nas at python.ca Tue Jan 21 10:49:34 2003 From: nas at python.ca (Neil Schemenauer) Date: Tue Jan 21 13:43:28 2003 Subject: [Spambayes] pushing back the cost of spam In-Reply-To: <20030120003505.GB6862@glacier.arctrix.com> References: <20030120001344.GA6862@glacier.arctrix.com> <20030120003505.GB6862@glacier.arctrix.com> Message-ID: <20030121184934.GA15762@glacier.arctrix.com> Neil Schemenauer wrote: > (I'll try to run some tests). See http://python.ca/nas/log/200301/index.html#21_001 for some preliminary results. In summary, it works. Neil From neale at woozle.org Tue Jan 21 10:45:26 2003 From: neale at woozle.org (Neale Pickett) Date: Tue Jan 21 13:45:35 2003 Subject: [Spambayes] Issue with pop3proxy In-Reply-To: =?iso-8859-1?q?(Fran=E7ois?= Granger's message of "Tue, 21 Jan 2003 19:39:44 +0100") References: Message-ID: Fran?ois Granger writes: > I got this message in my mailbox. The strange thing is that the > X-Spambayes-Classification: spam got added after the content, this is > not normal. It happens on some messages from more than one sender but > not all messages from the same sender on this mailing list. A quick > comparison between the headers of two messages from same sender, one > with proper X-Spambayes-Classification: in header and the other with the > field at end of message show no easy difference. Weird. Perhaps this is what happens when the email module can't parse the message. If the headers really were word-wrapped like you sent them, this is certainly the case. > > > > > ================================================== > Return-Path: > Received: from cancale.medicalistes.org (62.212.100.79) by > smtp.laposte.net (6.0.053) > id 3DF927E500486B4A; Tue, 21 Jan 2003 09:33:35 +0100 > Received: from cancale.medicalistes.org (localhost [127.0.0.1]) > by cancale.medicalistes.org (8.12.6/8.12.6/Debian-6Woody) with > ESMTP id h0L8XJOc031969; > Tue, 21 Jan 2003 09:33:19 +0100 > Received: (from sympa@localhost) > by cancale.medicalistes.org (8.12.6/8.12.6/Debian-6Woody) id > h0L8XJQe031967; > Tue, 21 Jan 2003 09:33:19 +0100 > X-Authentication-Warning: cancale.medicalistes.org: sympa set sender to > alois-owner@medicalistes.org using -f > Received: from mel-rto4.wanadoo.fr (smtp-out-4.wanadoo.fr [193.252.19.23]) > by cancale.medicalistes.org (8.12.6/8.12.6/Debian-6Woody) with > ESMTP id h0L8XGOc031964 > for ; Tue, 21 Jan 2003 09:33:16 +0100 > Received: from mel-rta6.wanadoo.fr (193.252.19.26) by > mel-rto4.wanadoo.fr (6.7.015) > id 3E0C33FD00EBE0FE for alois@medicalistes.org; Tue, 21 Jan > 2003 09:33:16 +0100 > Received: from pc (193.250.146.197) by mel-rta6.wanadoo.fr (6.7.015) > id 3E26CE21002ADE4B for alois@medicalistes.org; Tue, 21 Jan > 2003 09:33:15 +0100 > Message-ID: <003e01c2c128$485b70c0$c592fac1@pc> > From: "Martine Lemaitre" > To: > References: <000901c2c117$26334ce0$03b5a8c0@duron800> > Subject: Re: [Alois] P?ter les plombs > Date: Tue, 21 Jan 2003 09:26:42 +0100 > MIME-Version: 1.0 > Content-Type: multipart/alternative; > boundary="----=_NextPart_000_0029_01C2C12F.35460480" > X-Priority: 3 > X-MSMail-Priority: Normal > X-Mailer: Microsoft Outlook Express 5.00.2314.1300 > X-MimeOLE: Produced By Microsoft MimeOLE V5.00.2314.1300 > Reply-To: alois@medicalistes.org > X-Loop: alois@medicalistes.org > X-Sequence: 1253 > Precedence: list > X-no-archive: yes > List-Id: > > HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"> > > > > > > >
> > [HTML deleted...] > >
> X-Spambayes-Classification: spam > >
> ================================================== > -- > Recently using MacOSX....... > > _______________________________________________ > Spambayes mailing list > Spambayes@python.org > http://mail.python.org/mailman/listinfo/spambayes From tim.one at comcast.net Tue Jan 21 13:47:39 2003 From: tim.one at comcast.net (Tim Peters) Date: Tue Jan 21 13:49:55 2003 Subject: [Spambayes] FYI: Java implementation In-Reply-To: <200301211618.h0LGIai30812@localhost.localdomain> Message-ID: [Tim] > A CLM plot would consist of three vertical lines, and so be a > bit confusing . [Anthony Baxter] > Yes, but suggesting them _did_ get a nice simple summary about them out of > you, so it wasn't a complete loss :) Indeed not. Suck all you can out of my memory before I die. A thousand generations will pass before anyone can reconstruct it all from the code comments . From tim.one at comcast.net Tue Jan 21 14:38:12 2003 From: tim.one at comcast.net (Tim Peters) Date: Tue Jan 21 14:46:03 2003 Subject: [Spambayes] Promoting Spambayes (was Re: FYI: Java implementation) In-Reply-To: <20030121171157.67BCD16F16@jmason.org> Message-ID: [Justin Mason] > BTW it's worth noting we didn't just "nab" the ideas ;) I would have . > Instead I reimplemented based on descriptions, running a cross-validation > test each time, and threw in a few tokenization ideas of our own. One thing we found, on rare occasions, is that a change vetted as winner or loser via a CV run on one set of test data turned out to be neutral on somebody else's test data, or (very rarely) even gave an opposite result. Some small amount of that is expected by chance, of course, but multiple test sets (in addition to slicing & dicing a single test set) is an important check too. > In most cases the results indicated that SpamBayes' techniques are the > most effective -- there were a few extras, like SpamAssassin tokenizing > some headers that SB doesn't (From etc.), There are generally options to change all that. I became inactive as this project was transitioning from mostly-research to mostly-deployment, and the defaults still reflect the more severe "purity needs" of research. For example, virtually all the ham in my main test set had a common "From" line (it was generated by a news->email gateway) but none of my spam had that >From line. So "From" was ignored by default. In the Outlook 2000 client I use every day, though, From To Cc Sender and Reply-To are all tokenized. > and different S and X values, Note that Greg Louis has done a lot of good research on those, in connection with bogofilter. > but for the most part they're effectively the same. > > The nice thing is that it means those techniques have been independently > verified by 2 parties -- in other words, a scientific process ;) It's appreciated! That's more important than the specific algorithms used. Given a proper test framework, the data will eventually tell you what does and doesn't work; without proper statistical testing it's all guessing. A problem is what to do when error rates get too low to measure reliably. My previous life in speech recognition didn't prepare me for that one . From skip at pobox.com Tue Jan 21 14:10:05 2003 From: skip at pobox.com (Skip Montanaro) Date: Tue Jan 21 15:10:27 2003 Subject: [Spambayes] Re: [SAtalk] spampot -- spam honeypot server (fwd) In-Reply-To: References: <9A0FE27E-2D2D-11D7-AE99-0003939CB5D8@startechgroup.co.uk> Message-ID: <15917.43293.153834.759378@montanaro.dyndns.org> Neale> But I may need some signature-checking logic soon. I'm modifying utilities/loosecksum.py to incorporate many of the ideas Justin posted today about his fuzzy-hash-maildir script in SpamAssassin. Should have it checked in later today. Instead of returning a single md5 checksum it will return four separated by dots, one for each of the four blocks he mentioned. If you want to consider pieces you can just split on the dots. Skip From neale at woozle.org Tue Jan 21 12:10:26 2003 From: neale at woozle.org (Neale Pickett) Date: Tue Jan 21 15:10:33 2003 Subject: [Spambayes] degeneration Message-ID: So one of the more interesting things I left the spam conference with was Paul Graham's notion of "degeneration". The idea is simple. If you tokenize "FREE!!!!", but that's not in your wordlist, try the following until you get a match: FREE!!!! Free!!!! free!!!! FREE!!! Free!!! free!!! FREE!! Free!! free!! FREE! Free! free! FREE Free free He claims this helps a lot. I'm currently in the midst of getting hammiefilter to integrate more cleanly with Gnus and Mutt, and merging mboxtrain and hammiebulk. But this should be relatively easy to implement and test. Any takers? Neale From vanhorn at whidbey.com Tue Jan 21 12:12:28 2003 From: vanhorn at whidbey.com (G. Armour Van Horn) Date: Tue Jan 21 15:12:35 2003 Subject: [Spambayes] Promoting Spambayes (was Re: FYI: Javaimplementation) References: <20030121171157.67BCD16F16@jmason.org> Message-ID: <3E2DA9AC.A1CE12F6@whidbey.com> You used the past tense there, is this really in Spam Assassin now? I just upgraded SA last week and didn't notice any references to a Spambayesian filter and would dearly love to turn it on if it's in there somewhere. Van Justin Mason wrote: > Neil Schemenauer said: > > Matt Sergeant wrote: > > > Mozilla and SpamAssassin both copy their bayesian code from spambayes > > > (including tokenisation ideas and combiners). > > > > I, for one, am extremely pleased to hear that. It would be a shame if > > people kept using Paul Graham's original algorithm after all the work > > that was put in improving Spambayes. Despite what was said at the spam > > conference, I think the algorithm is important. > > BTW it's worth noting we didn't just "nab" the ideas ;) Instead I > reimplemented based on descriptions, running a cross-validation test each > time, and threw in a few tokenization ideas of our own. In most cases the > results indicated that SpamBayes' techniques are the most effective -- > there were a few extras, like SpamAssassin tokenizing some headers that SB > doesn't (From etc.), and different S and X values, but for the most part > they're effectively the same. > > The nice thing is that it means those techniques have been independently > verified by 2 parties -- in other words, a scientific process ;) > > --j. > > _______________________________________________ > Spambayes mailing list > Spambayes@python.org > http://mail.python.org/mailman/listinfo/spambayes -- ---------------------------------------------------------- Sign up now for Quotes of the Day, a handful of quotations on a theme delivered every morning. Enlightenment! Daily, for free! mailto:twisted@whidbey.com?subject=Subscribe_QOTD For web hosting and maintenance, visit Van's home page: http://www.domainvanhorn.com/van/ ---------------------------------------------------------- From hupp at upl.cs.wisc.edu Tue Jan 21 15:27:41 2003 From: hupp at upl.cs.wisc.edu (Adam Hupp) Date: Tue Jan 21 16:36:39 2003 Subject: [Spambayes] degeneration In-Reply-To: References: Message-ID: <20030121212741.GA3849@upl.cs.wisc.edu> On Tue, Jan 21, 2003 at 12:10:26PM -0800, Neale Pickett wrote: > > He claims this helps a lot. I'm currently in the midst of getting > hammiefilter to integrate more cleanly with Gnus and Mutt, and merging > mboxtrain and hammiebulk. But this should be relatively easy to > implement and test. Any takers? I'm curious what you're doing for the mutt integration. I was playing with spambayes a few months ago and worked up an (IMO) fairly useful mutt integration. It was a combination of procmail rules, mutt macros and changes to hammiefilter that allowed marking and retraining on Unsures, retraining on mistakes, automatic training, etc. All this was put on hold by a desire to graduate but now I'm excited to start working on it again. -Adam From drew at poured.net Tue Jan 21 16:22:03 2003 From: drew at poured.net (Drew Raines) Date: Tue Jan 21 17:22:32 2003 Subject: [Spambayes] Re: degeneration References: Message-ID: Neale Pickett writes: > I'm currently in the midst of getting hammiefilter to integrate > more cleanly with Gnus and Mutt, and merging mboxtrain and > hammiebulk. But this should be relatively easy to implement and > test. Any takers? Me. Before you get too far, though, make sure you look at spam.el in Oorts of late. -Drew From jm at jmason.org Wed Jan 22 00:26:54 2003 From: jm at jmason.org (Justin Mason) Date: Tue Jan 21 19:26:39 2003 Subject: [Spambayes] Promoting Spambayes (was Re: FYI: Javaimplementation) In-Reply-To: Message from "G. Armour Van Horn" of "Tue, 21 Jan 2003 12:12:28 PST." <3E2DA9AC.A1CE12F6@whidbey.com> Message-ID: <20030122002659.A96C316F18@jmason.org> G. Armour Van Horn said: > You used the past tense there, is this really in Spam Assassin now? I just > upgraded SA last week and didn't notice any references to a Spambayesian > filter and would dearly love to turn it on if it's in there somewhere. No, it's still in CVS -- but mucho rescoring and GA running going on at the mo' for a release RSN. --j. From neale at woozle.org Tue Jan 21 16:51:57 2003 From: neale at woozle.org (Neale Pickett) Date: Tue Jan 21 19:52:07 2003 Subject: [Spambayes] Re: degeneration In-Reply-To: (Drew Raines's message of "Tue, 21 Jan 2003 16:22:03 -0600") References: Message-ID: Drew Raines writes: > Neale Pickett writes: > >> I'm currently in the midst of getting hammiefilter to integrate >> more cleanly with Gnus and Mutt, and merging mboxtrain and >> hammiebulk. But this should be relatively easy to implement and >> test. Any takers? > > Me. Before you get too far, though, make sure you look at spam.el > in Oorts of late. Yeah, I did check out spam.el, but it's not exactly what I want. Namely, two keybindings for "refile as spam" and "refile as ham". The rest will be done by procmail. Anything further would need spam.el. Neale From seant at webreply.com Tue Jan 21 19:54:51 2003 From: seant at webreply.com (Sean True) Date: Tue Jan 21 20:24:59 2003 Subject: [Spambayes] Declaring victory Message-ID: Tim wrote: >It's appreciated! That's more important than the specific algorithms used. >Given a proper test framework, the data will eventually tell you what does >and doesn't work; without proper statistical testing it's all guessing. A >problem is what to do when error rates get too low to measure reliably. My >previous life in speech recognition didn't prepare me for that one . Geez, Tim, even I got prepared for that one at Dragon: when the error rate gets low enough, you declare victory and move on. Before they throw you in jail for fraud! Say. You already did that. Just-winking-this-one-time-ly yours, Sean ------- Sean True WebReply, Inc. From tim_one at email.msn.com Tue Jan 21 21:29:36 2003 From: tim_one at email.msn.com (Tim Peters) Date: Tue Jan 21 21:30:19 2003 Subject: [Spambayes] Declaring victory In-Reply-To: Message-ID: [Tim] > A problem is what to do when error rates get too low to measure > reliably. My previous life in speech recognition didn't prepare me for > that one . [Sean True] > Geez, Tim, even I got prepared for that one at Dragon: when the error > rate gets low enough, you declare victory and move on. Before they throw > you in jail for fraud! > > Say. You already did that. I learned more from our betters, too: I never took spambayes public, and the untold millions I made off teasing potential investors are safely tucked away in offshore accounts. Now it's back to the quieter life of smuggling drugs. still-missing-the-action-though-ly y'rs - tim From tim_one at email.msn.com Tue Jan 21 21:47:12 2003 From: tim_one at email.msn.com (Tim Peters) Date: Tue Jan 21 21:47:54 2003 Subject: [Spambayes] degeneration In-Reply-To: Message-ID: [Neale Pickett] > So one of the more interesting things I left the spam conference with > was Paul Graham's notion of "degeneration". The idea is simple. If you > tokenize "FREE!!!!", but that's not in your wordlist, try the following > until you get a match: > > FREE!!!! > Free!!!! > free!!!! > FREE!!! > Free!!! > free!!! > FREE!! > Free!! > free!! > FREE! > Free! > free! > FREE > Free > free We fold case so it's easier for us (just 5 possibilities). > He claims this helps a lot. Ya, but he's still artificially boosting ham counts by a factor of 2 -- it's small wonder then that some other gimmick is needed to counteract the bias. > I'm currently in the midst of getting hammiefilter to integrate more > cleanly with Gnus and Mutt, and merging mboxtrain and hammiebulk. But > this should be relatively easy to implement and test. Any takers? It wouldn't be hard to implement, and I agree it's interesting. So far as testing goes, I don't have any test data that *can* show an improvement anymore, so I lost interest in tweaking the algorithms. Do you have test sets that could show improvements? If so, you can eyeball the mistakes and usually make a good guess as to whether a specific new gimmick would help them. What you can't usually guess is whether the gimmick would hurt the correctly classified msgs more than it helps the mistakes. From neale at woozle.org Tue Jan 21 21:20:34 2003 From: neale at woozle.org (Neale Pickett) Date: Wed Jan 22 00:20:41 2003 Subject: [Spambayes] degeneration In-Reply-To: <20030121212741.GA3849@upl.cs.wisc.edu> (Adam Hupp's message of "Tue, 21 Jan 2003 15:27:41 -0600") References: <20030121212741.GA3849@upl.cs.wisc.edu> Message-ID: Adam Hupp writes: > I'm curious what you're doing for the mutt integration. I was playing > with spambayes a few months ago and worked up an (IMO) fairly useful > mutt integration. It was a combination of procmail rules, mutt macros > and changes to hammiefilter that allowed marking and retraining on > Unsures, retraining on mistakes, automatic training, etc. That, in a nutshell, is what I'm doing for mutt integration. But I'm a Gnus user; someone more familiar with mutt's innards would be a better candidate to write up mutt integration instructions :) I'm going to go ahead and check in my new hammiefilter.py with big [EXPERIMENTAL] disclaimers by most of the options--I'm not sure they actually do what they say they do yet. But the basic idea is to run "hammiefilter.py -t" from procmail, so that it trains on its decisions. Then you can tweak it in your MUA by hitting some magic key which will pipe it to "hammiefilter.py -s -f" for spam incognito, and "hammiefilter.py -g -f" for false negatives. The "-t" step inserts a header telling how it trained itself. The "-s" and "-g" options, when they see that header, will untrain, then retrain. So what we need is some sort of mutt magic (note: not "butt magic") to pipe a message out to something, then remove it, all in one keystroke. The pipe would look something like "hammiefilter.py -g -f | procmail". Or is there a more mutty way to do it? > All this was put on hold by a desire to graduate but now I'm excited > to start working on it again. Yes, the desire to graduate does get strong at times, but eventually it always subsides. I hope that for you it subsided because you actually graduated ;) Neale From richard at jowsey.com Wed Jan 22 16:28:56 2003 From: richard at jowsey.com (Richard Jowsey) Date: Wed Jan 22 00:29:38 2003 Subject: [Spambayes] FYI: Java implementation In-Reply-To: References: <3E2B9962.26334.308D0BD@localhost> Message-ID: <3E2EC6C8.20276.54112D8@localhost> > > That chi2 test is definitely on > > the drawing boards, even if only for comparison purposes... > > Anthony Baxter has some plots of score distributions for > Graham-combining, Gary-combining and chi-combining here: > http://spambayes.sourceforge.net/background.html Damn nice graphics! And a good explanation for the advantages of the chi-squared "combining and scoring" treatment. OK, so I'm a believer! :-) > It's the sharpness and spread of the separation in chi- that's > attractive. Indeed! I've now mostly finished my core word-tokenization and training logic, and am presently running sweeps across my good/spam corpus to complete populating the database. I'll be re- working the comparator classes presently, to incorporate this chi-2 math. Will keep everyone posted as to progress... Cheers, Richard ---------------------------------------------------------------- "Once the number three, being the third number, be reached, then lobbest thou thy Holy Hand Grenade of Antioch towards thou foe, who being naughty in my sight, shall snuff it!" From neale at woozle.org Tue Jan 21 21:53:55 2003 From: neale at woozle.org (Neale Pickett) Date: Wed Jan 22 00:53:58 2003 Subject: [Spambayes] degeneration In-Reply-To: ("Tim Peters"'s message of "Tue, 21 Jan 2003 21:47:12 -0500") References: Message-ID: "Tim Peters" writes: > It wouldn't be hard to implement, and I agree it's interesting. So > far as testing goes, I don't have any test data that *can* show an > improvement anymore, so I lost interest in tweaking the algorithms. > Do you have test sets that could show improvements? Well, no, actually. I keep forgetting this thing is so good. I have been getting more false negatives lately than I'd like, but I'm sure that's because I keep fouling up my wordlist while testing hammiefilter options. But I'll be sure and let everyone know if it turns out that I actually do have a difficult set of test data. Neale From hupp at upl.cs.wisc.edu Wed Jan 22 00:16:45 2003 From: hupp at upl.cs.wisc.edu (Adam Hupp) Date: Wed Jan 22 01:16:49 2003 Subject: [Spambayes] degeneration In-Reply-To: References: <20030121212741.GA3849@upl.cs.wisc.edu> Message-ID: <20030122061645.GA6147@upl.cs.wisc.edu> On Tue, Jan 21, 2003 at 09:20:34PM -0800, Neale Pickett wrote: > > I'm going to go ahead and check in my new hammiefilter.py with big > [EXPERIMENTAL] disclaimers by most of the options--I'm not sure they > actually do what they say they do yet. > > But the basic idea is to run "hammiefilter.py -t" from procmail, so that > it trains on its decisions. Then you can tweak it in your MUA by > hitting some magic key which will pipe it to "hammiefilter.py -s -f" for > spam incognito, and "hammiefilter.py -g -f" for false negatives. The > "-t" step inserts a header telling how it trained itself. The "-s" and > "-g" options, when they see that header, will untrain, then retrain. > > So what we need is some sort of mutt magic (note: not "butt magic") to > pipe a message out to something, then remove it, all in one keystroke. > The pipe would look something like "hammiefilter.py -g -f | procmail". > Or is there a more mutty way to do it? It looks like your hammiefilter uses almost the same interface that mine does, so the integration should be a snap. Since I can't add and modify arbitrary headers from within mutt (someone please correct me if I'm wrong) I'm using one of the flags ("important" I believe) to indicate an unsure in need of training. .procmailrc: :0 fhb w:hammie | /home/hupp/spambayes/hammiefilter.py --filter --train :0: * ^X-Hammie-Disposition: Yes caughtspam :0 fh w: * ^X-Hammie-Disposition: Unsure |formail -i "X-Status: F" .muttrc: folder-hook . "macro index F '|hammiefilter.py --reverse --train --good\n =caughtspam\n'" folder-hook . "macro pager F '|hammiefilter.py --reverse --train --good\n =caughtspam\n'" folder-hook caughtspam "macro index F '|hammiefilter.py --reverse --train --spam\r !\r'" folder-hook caughtspam "macro pager F '|hammiefilter.py --reverse --train --spam\r !\r'" macro pager H "|hammiefilter.py --train --good\r !" macro index H "|hammiefilter.py --train --good\r !" macro pager S "|hammiefilter.py --train --spam\r !\r =caughtspam\r" macro index S "|hammiefilter.py --train --spam\r !\r =caughtspam\r" color index red black "~h 'X-Hammie-Disposition: Unsure' ~F" This puts messages scored as spam into the caughtspam folder. If you are in the caughtspam folder and type "F" (for false) it will untrain as spam, retrain as ham, and move it to the mail spool. If you are in any other folder it does the opposite and moves into caughtspam. Unsure messages show up as red in the index; "H" or "S" trains and removes the flag. I'm not positive the Unsure flagging works entirely correctly, it's been a while. > Yes, the desire to graduate does get strong at times, but eventually it > always subsides. I hope that for you it subsided because you actually > graduated ;) Today, actually. Now I have time for more important matters such as ridding my mailbox of spam. -Adam From anthony at interlink.com.au Wed Jan 22 19:08:37 2003 From: anthony at interlink.com.au (Anthony Baxter) Date: Wed Jan 22 03:11:11 2003 Subject: [Spambayes] KMail integration? Message-ID: <200301220808.h0M88cm13824@localhost.localdomain> Has anyone had thoughts about KMail integration? (The KDE mailer). I don't use it, but have a number of colleagues that do, and they'd like something that's easy to use for spam killing... Anthony From tdickenson at devmail.geminidataloggers.co.uk Wed Jan 22 08:41:43 2003 From: tdickenson at devmail.geminidataloggers.co.uk (Toby Dickenson) Date: Wed Jan 22 03:41:46 2003 Subject: [Spambayes] KMail integration? In-Reply-To: <200301220808.h0M88cm13824@localhost.localdomain> References: <200301220808.h0M88cm13824@localhost.localdomain> Message-ID: <200301220841.43243.tdickenson@devmail.geminidataloggers.co.uk> On Wednesday 22 January 2003 8:08 am, Anthony Baxter wrote: > Has anyone had thoughts about KMail integration? (The KDE mailer). > > I don't use it, but have a number of colleagues that do, and they'd > like something that's easy to use for spam killing... I use KMail with procmail. procmail adds the X-Hammie-Disposition header, and KMail filters using it. Im not sure if that qualifies as "integration" From msergeant at startechgroup.co.uk Wed Jan 22 10:04:09 2003 From: msergeant at startechgroup.co.uk (Matt Sergeant) Date: Wed Jan 22 05:04:09 2003 Subject: [Spambayes] Promoting Spambayes (was Re: FYI: Java implementation) In-Reply-To: <5.1.0.14.0.20030121112431.01ea3390@mail.telecommunity.com> Message-ID: On Tuesday, Jan 21, 2003, at 16:28 Europe/London, Phillip J. Eby wrote: >> It's nice to be proud of software, but when it's open source you >> kinda leave it wide open for us to nab your ideas ;-) > > That's the whole point of promoting Spambayes, really. To get the > "good stuff" into the hands of more people. I'd rather see lots of > programs using the more effective ideas, than have a bunch of > non-programmers try less effective tools and swear off of "learning" > spam filters because of it. True, but I can also see the flip side of it - it's almost always better to have a number of different tools implementing different algorithms because then the spammers have to work a *lot* harder to get around them. Matt. From msergeant at startechgroup.co.uk Wed Jan 22 10:08:53 2003 From: msergeant at startechgroup.co.uk (Matt Sergeant) Date: Wed Jan 22 05:08:53 2003 Subject: [Spambayes] Promoting Spambayes (was Re: FYI: Java implementation) In-Reply-To: <20030121171157.67BCD16F16@jmason.org> Message-ID: <8304A044-2DF1-11D7-AE99-0003939CB5D8@startechgroup.co.uk> On Tuesday, Jan 21, 2003, at 17:11 Europe/London, Justin Mason wrote: > Neil Schemenauer said: >> Matt Sergeant wrote: >>> Mozilla and SpamAssassin both copy their bayesian code from spambayes >>> (including tokenisation ideas and combiners). >> >> I, for one, am extremely pleased to hear that. It would be a shame if >> people kept using Paul Graham's original algorithm after all the work >> that was put in improving Spambayes. Despite what was said at the >> spam >> conference, I think the algorithm is important. > > BTW it's worth noting we didn't just "nab" the ideas ;) Well I did, and then I gave SA most of my code ;-) Matt. From skip at pobox.com Wed Jan 22 08:00:54 2003 From: skip at pobox.com (Skip Montanaro) Date: Wed Jan 22 09:01:10 2003 Subject: [Spambayes] KMail integration? In-Reply-To: <200301220808.h0M88cm13824@localhost.localdomain> References: <200301220808.h0M88cm13824@localhost.localdomain> Message-ID: <15918.42006.932397.960128@montanaro.dyndns.org> Anthony> Has anyone had thoughts about KMail integration? No, but if it understands POP I suspect your colleagues could just use pop3proxy. Skip From m2 at plusseven.com Wed Jan 22 15:37:19 2003 From: m2 at plusseven.com (Alex Polite) Date: Wed Jan 22 09:37:40 2003 Subject: [Spambayes] dumbdbm faster than bsddb3 Message-ID: <20030122143719.GA2540@matijek> I moved from spamcan to spambayes today and wasted a couple hours profiling hammie.py profile.run("spambayes.hammiebulk.main()", '/tmp/stats') I ran this on approximately 2000 messages and aggregated the stats. The entire run was 496 CPU seconds. When looking at the profiling information I realized that I was using dumbdbm, which is supposed to very slow. I installed bsddb3, rebuilt my db and rerun the profiling tests. The entire run was now 520 CPU seconds, a 4.8% increase. So it seems like "stupid beats smart" goes for speed optimizations to. Can anyone corroborate this? -- Alex Polite http://plusseven.com/gpg From drew at poured.net Wed Jan 22 09:29:29 2003 From: drew at poured.net (Drew Raines) Date: Wed Jan 22 10:29:22 2003 Subject: [Spambayes] Does spambayes train on its own headers? Message-ID: Since my corpus is relatively small (on the order of hundreds for spam and many hundreds for ham), I get false negatives fairly frequently. This doesn't bother me; I just move them to my spam folder where my hammie cron trains on them daily. These old spams, though, have X-Spambayes-Classification and X-Hammie-Debug headers which could skew statistics in .hammiedb. Do I need to add those to safe_headers in $BAYESCUSTOMIZE, or does hammiefilter know not to look at hammie_header_name and hammie_debug_header_name when training? -Drew From noreply at sourceforge.net Wed Jan 22 07:28:08 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Wed Jan 22 10:40:20 2003 Subject: [Spambayes] [ spambayes-Bugs-672489 ] Problems with unallowed chars in XML content Message-ID: Bugs item #672489, was opened at 2003-01-22 16:28 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=672489&group_id=61702 Category: None Group: None Status: Open Resolution: None Priority: 5 Submitted By: Jürgen Hermann (jhermann) Assigned to: Nobody/Anonymous (nobody) Summary: Problems with unallowed chars in XML content Initial Comment: The attached patch fixes problems with subjects like the following: 'Valentines Day Special \x96 2 bikinis for the pric...' When you try to review such a message, you get an XML parsing error (note the \x96). ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=672489&group_id=61702 From noreply at sourceforge.net Wed Jan 22 07:33:03 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Wed Jan 22 10:40:25 2003 Subject: [Spambayes] [ spambayes-Bugs-672495 ] Files not installed by setup.py Message-ID: Bugs item #672495, was opened at 2003-01-22 16:33 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=672495&group_id=61702 Category: None Group: None Status: Open Resolution: None Priority: 5 Submitted By: Jürgen Hermann (jhermann) Assigned to: Nobody/Anonymous (nobody) Summary: Files not installed by setup.py Initial Comment: Patch: =================================================================== RCS file: /cvsroot/spambayes/spambayes/setup.py,v retrieving revision 1.13 diff -u -r1.13 setup.py --- setup.py 17 Jan 2003 06:45:36 -0000 1.13 +++ setup.py 22 Jan 2003 15:28:05 -0000 @@ -39,8 +39,12 @@ 'pop3proxy.py', 'proxytrainer.py', 'proxytee.py', + 'OptionConfig.py', ], - packages = [ 'spambayes', ], + packages = [ + 'spambayes', + 'spambayes.resources', + ], classifiers = [ 'Development Status :: 4 - Beta', 'Environment :: Console', ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=672495&group_id=61702 From neale at woozle.org Wed Jan 22 08:20:53 2003 From: neale at woozle.org (Neale Pickett) Date: Wed Jan 22 11:21:01 2003 Subject: [Spambayes] Does spambayes train on its own headers? In-Reply-To: (Drew Raines's message of "Wed, 22 Jan 2003 09:29:29 -0600") References: Message-ID: Drew Raines writes: > Do I need to add those to safe_headers in $BAYESCUSTOMIZE, or does > hammiefilter know not to look at hammie_header_name and > hammie_debug_header_name when training? hammiefilter doesn't do anything special WRT its own headers. But unless you changed your bayescustomize.ini file, the tokenizer will skip all the X-Spambayes-* headers, so you're okay. Of course, I could be reading the default options incorrectly... From skip at pobox.com Wed Jan 22 10:25:56 2003 From: skip at pobox.com (Skip Montanaro) Date: Wed Jan 22 11:26:05 2003 Subject: [Spambayes] Does spambayes train on its own headers? In-Reply-To: References: Message-ID: <15918.50708.875665.899376@montanaro.dyndns.org> Drew> Do I need to add those to safe_headers in $BAYESCUSTOMIZE, or does Drew> hammiefilter know not to look at hammie_header_name and Drew> hammie_debug_header_name when training? Drew, I believe spambayes ignores its own headers. Just the same, I strip them using unheader.py. Here's my training script: #!/bin/bash export BAYESCUSTOMIZE=$HOME/hammie.opt cd ~/tmp # touch the messages up a bit to avoid spurious "clues" unheader.py -p 'X-VM|X-Hammie|X-Spam' newham > newham.clean unheader.py -p 'X-VM|X-Hammie|X-Spam' newspam > newspam.clean # do the deed hammie.py -d -p ~/hammie.db -g newham.clean -s newspam.clean # save the files for later retraining echo "" >> newham.clean.save cat newham.clean >> newham.clean.save rm newham newham.clean echo "" >> newspam.clean.save cat newspam.clean >> newspam.clean.save rm newspam newspam.clean I save ham and spam to ~/tmp/{newham,newspam}. Skip From neale at woozle.org Wed Jan 22 08:37:56 2003 From: neale at woozle.org (Neale Pickett) Date: Wed Jan 22 11:38:01 2003 Subject: [Spambayes] packaging question Message-ID: I have an emacs lisp file, spambayes.el, which integrates spambayes into Gnus. I'd like to rename the hammie/ directory to contrib/, and put my spambayes.el (as well as an example muttrc) in there. Any objections? Neale From skip at pobox.com Wed Jan 22 10:50:53 2003 From: skip at pobox.com (Skip Montanaro) Date: Wed Jan 22 11:51:04 2003 Subject: [Spambayes] packaging question In-Reply-To: References: Message-ID: <15918.52205.545824.761182@montanaro.dyndns.org> Neale> I'd like to rename the hammie/ directory to contrib/, and put my Neale> spambayes.el (as well as an example muttrc) in there. Neale> Any objections? No, but I'd prefer it if you coaxed the SF folks into renaming it so CVS history info is retained. Skip From francois.granger at free.fr Wed Jan 22 19:02:46 2003 From: francois.granger at free.fr (Fran=?ISO-8859-1?B?5w==?=ois Granger) Date: Wed Jan 22 13:03:03 2003 Subject: [Spambayes] Congratulations Message-ID: Today I got the latest CVS version on my work Mac (Cube MacOS 9.1 192MB RAM). I copied my bayescustomize.ini file and my db file from my previous setup (dated December 2) and ran the new version through pop3proxy.py. Et voil? ! [1] It worked like a charm. By the way, I think that the released version should ship with a minimal bayescustomize.ini file loaded for pop3proxy use with a fake server. People will have an easier time to replace this with their real server name. Something like: ======================================== [pop3proxy] pop3proxy_persistent_storage_file = hammie.db pop3proxy_servers = pop.yourisp.com, pop.otherisp.com pop3proxy_ports = 110, 1110 # Replace the values pop.yourisp.com, pop.otherisp.com by you real servers. # In you mail app, as pop server, put # 127.0.0.1 for the account pop.yourisp.com and # 127.0.0.1:1110 for the account pop.otherisp.com # and you are done. If you uses Eudora, see documentation. [globals] dbm_type = best verbose = False [html_ui] html_ui_launch_browser = True html_ui_port = 8880 html_ui_allow_remote_connections = True ======================================== Anyway, I don't think that Spambayes on OS 9 will ever be a hit because: - it is really slow and slow down mail retrieval (thanks cooperative multitasking where some does not cooperate enough) - It needs a lot of memory (I gave it 25 MB) and since memory allocation is not dynamic on MacOS 9, your are stuck with less memory or you have to launch it only when needed, which diminish the usefulness. [1] As americans say... ;-) -- Le courrier est un moyen de communication. Les gens devraient se poser des questions sur les implications politiques des choix (ou non choix) de leurs outils et technologies. Pour des courriers propres : -- From python-spambayes at discworld.dyndns.org Wed Jan 22 12:14:17 2003 From: python-spambayes at discworld.dyndns.org (Charles Cazabon) Date: Wed Jan 22 13:11:39 2003 Subject: [Spambayes] Congratulations In-Reply-To: ; from francois.granger@free.fr on Wed, Jan 22, 2003 at 07:02:46PM +0100 References: Message-ID: <20030122121417.C21326@discworld.dyndns.org> Fran?ois Granger wrote: > > By the way, I think that the released version should ship with a minimal > bayescustomize.ini file loaded for pop3proxy use with a fake server. People > will have an easier time to replace this with their real server name. > Something like: > > ======================================== > [pop3proxy] > pop3proxy_persistent_storage_file = hammie.db > pop3proxy_servers = pop.yourisp.com, pop.otherisp.com If this idea is chosen, please use one of the domains reserved for this, like "example.org". "yourisp.com" and "otherisp.com" are available for people to register, and if they were my domains, I wouldn't appreciate the extra traffic. Charles -- ----------------------------------------------------------------------- Charles Cazabon GPL'ed software available at: http://www.qcc.ca/~charlesc/software/ ----------------------------------------------------------------------- From noreply at sourceforge.net Wed Jan 22 09:46:31 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Wed Jan 22 13:12:34 2003 Subject: [Spambayes] [ spambayes-Bugs-672489 ] Problems with unallowed chars in XML content Message-ID: Bugs item #672489, was opened at 2003-01-22 15:28 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=672489&group_id=61702 Category: None Group: None Status: Open Resolution: None Priority: 5 Submitted By: Jürgen Hermann (jhermann) >Assigned to: Richie Hindle (richiehindle) Summary: Problems with unallowed chars in XML content Initial Comment: The attached patch fixes problems with subjects like the following: 'Valentines Day Special \x96 2 bikinis for the pric...' When you try to review such a message, you get an XML parsing error (note the \x96). ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=672489&group_id=61702 From richie at entrian.com Wed Jan 22 18:18:36 2003 From: richie at entrian.com (Richie Hindle) Date: Wed Jan 22 13:19:05 2003 Subject: [Spambayes] Issue with pop3proxy In-Reply-To: References: Message-ID: [Fran?ois] > the X-Spambayes-Classification: spam got added after the content [Neale] > Perhaps this is what happens when the email module can't parse the message. No, the POP3 proxy doesn't use 'email' when adding the X-Spambayes-Classification header, exactly because there are messages that it can't parse. It splits the headers from the body like this: headers, body = re.split(r'\n\r?\n', messageText, 1) so it's difficult to know how it can fail. Perhaps the messages uses only '\r's to terminate the headers. Fran?ois, could you do me a favour? Could you send me an exact copy of one of these messages? Send it as a binary attachment (eg. in a zip file), so that nothing mucks about with the line endings. You should find it in one of your cache directories. Thanks. -- Richie Hindle richie@entrian.com From richie at entrian.com Wed Jan 22 18:18:45 2003 From: richie at entrian.com (Richie Hindle) Date: Wed Jan 22 13:19:13 2003 Subject: [Spambayes] Congratulations In-Reply-To: References: Message-ID: <4nnt2vcd05qgg9m6ldlea194pnc6doo05f@4ax.com> [Fran?ois] > It worked like a charm. Great! Another satisfied customer. 8-) > By the way, I think that the released version should ship with a minimal > bayescustomize.ini file loaded for pop3proxy use with a fake server. People > will have an easier time to replace this with their real server name. This is a good idea, but the obvious defaults are different for different platforms, which makes things difficult. On Windows, the proxy should run on port 110 because non-root processes can do that, and it saves having to reconfigure your email client to use a non-default port. On Unix, using port 110 means running as root, and possibly conflicting with an existing POP3 server (which you're much more likely to find on unix than on Windows), so it should default to something 1110 instead. Awkward, inconsistent, potentially confusing. (I'm including MacOS X in with Unix here - I assume that's correct?) Have you looked at the web configuration page? That attempts to explain how to configure the POP3 proxy, and should be easier than modifying bayescustomize.ini (though it doesn't talk about privileged ports). We should encourage POP3 proxy users to set up via that rather than via hand-editing bayescustomize.ini. -- Richie Hindle richie@entrian.com From noreply at sourceforge.net Wed Jan 22 10:34:52 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Wed Jan 22 13:40:41 2003 Subject: [Spambayes] [ spambayes-Bugs-672489 ] Problems with unallowed chars in XML content Message-ID: Bugs item #672489, was opened at 2003-01-22 15:28 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=672489&group_id=61702 Category: None Group: None >Status: Closed >Resolution: Accepted Priority: 5 Submitted By: Jürgen Hermann (jhermann) Assigned to: Richie Hindle (richiehindle) Summary: Problems with unallowed chars in XML content Initial Comment: The attached patch fixes problems with subjects like the following: 'Valentines Day Special \x96 2 bikinis for the pric...' When you try to review such a message, you get an XML parsing error (note the \x96). ---------------------------------------------------------------------- >Comment By: Richie Hindle (richiehindle) Date: 2003-01-22 18:34 Message: Logged In: YES user_id=85414 Many thanks, Jürgen. Checked in with PyMeldLite.py 1.4. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=672489&group_id=61702 From noreply at sourceforge.net Wed Jan 22 10:43:05 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Wed Jan 22 13:40:46 2003 Subject: [Spambayes] [ spambayes-Bugs-672495 ] Files not installed by setup.py Message-ID: Bugs item #672495, was opened at 2003-01-22 15:33 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=672495&group_id=61702 Category: None Group: None Status: Open Resolution: None Priority: 5 Submitted By: Jürgen Hermann (jhermann) Assigned to: Nobody/Anonymous (nobody) Summary: Files not installed by setup.py Initial Comment: Patch: =================================================================== RCS file: /cvsroot/spambayes/spambayes/setup.py,v retrieving revision 1.13 diff -u -r1.13 setup.py --- setup.py 17 Jan 2003 06:45:36 -0000 1.13 +++ setup.py 22 Jan 2003 15:28:05 -0000 @@ -39,8 +39,12 @@ 'pop3proxy.py', 'proxytrainer.py', 'proxytee.py', + 'OptionConfig.py', ], - packages = [ 'spambayes', ], + packages = [ + 'spambayes', + 'spambayes.resources', + ], classifiers = [ 'Development Status :: 4 - Beta', 'Environment :: Console', ---------------------------------------------------------------------- >Comment By: Richie Hindle (richiehindle) Date: 2003-01-22 18:43 Message: Logged In: YES user_id=85414 You're dead right about spambayes.resources, but I'm not convinced we should be installing OptionConfig.py now that it's been folded into the main pop3proxy web interface. I asked on the list whether anyone thought we should leave it in with the other scripts and got no replies. I'm tempted to move it into the spambayes package, from where pop3proxy.py can import it. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=672495&group_id=61702 From richie at entrian.com Wed Jan 22 19:11:11 2003 From: richie at entrian.com (Richie Hindle) Date: Wed Jan 22 14:11:38 2003 Subject: [Spambayes] packaging question In-Reply-To: References: Message-ID: [Neale] > I'd like to rename the hammie/ directory to contrib/, and put my > spambayes.el (as well as an example muttrc) in there. Good plan. We should possibly ship it with the release as well - the documentation will eventually refer to things like your muttrc, so we should be shipping them. We don't have to install any scripts, just copy the contrib directory into the installation so that people have it to refer to. Well done on the muttrc - you've saved me a lot of work there. And Don Marti will be pleased. 8-) -- Richie Hindle richie@entrian.com From noreply at sourceforge.net Wed Jan 22 12:22:58 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Wed Jan 22 15:31:05 2003 Subject: [Spambayes] [ spambayes-Bugs-672495 ] Files not installed by setup.py Message-ID: Bugs item #672495, was opened at 2003-01-22 16:33 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=672495&group_id=61702 Category: None Group: None Status: Open Resolution: None Priority: 5 Submitted By: Jürgen Hermann (jhermann) Assigned to: Nobody/Anonymous (nobody) Summary: Files not installed by setup.py Initial Comment: Patch: =================================================================== RCS file: /cvsroot/spambayes/spambayes/setup.py,v retrieving revision 1.13 diff -u -r1.13 setup.py --- setup.py 17 Jan 2003 06:45:36 -0000 1.13 +++ setup.py 22 Jan 2003 15:28:05 -0000 @@ -39,8 +39,12 @@ 'pop3proxy.py', 'proxytrainer.py', 'proxytee.py', + 'OptionConfig.py', ], - packages = [ 'spambayes', ], + packages = [ + 'spambayes', + 'spambayes.resources', + ], classifiers = [ 'Development Status :: 4 - Beta', 'Environment :: Console', ---------------------------------------------------------------------- >Comment By: Jürgen Hermann (jhermann) Date: 2003-01-22 21:22 Message: Logged In: YES user_id=39128 The current problem is the import in line 153 of pop3proxy: from OptionConfig import OptionsConfigurator Moving OptionConfig into the package is surely the best fix, including adapting the above import. ---------------------------------------------------------------------- Comment By: Richie Hindle (richiehindle) Date: 2003-01-22 19:43 Message: Logged In: YES user_id=85414 You're dead right about spambayes.resources, but I'm not convinced we should be installing OptionConfig.py now that it's been folded into the main pop3proxy web interface. I asked on the list whether anyone thought we should leave it in with the other scripts and got no replies. I'm tempted to move it into the spambayes package, from where pop3proxy.py can import it. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=672495&group_id=61702 From BPettersen at NAREX.com Wed Jan 22 18:43:54 2003 From: BPettersen at NAREX.com (Bjorn Pettersen) Date: Wed Jan 22 20:57:03 2003 Subject: [Spambayes] I did something stupid... Message-ID: <60FB8BB7F0EFC7409B75EEEC13E2019201BFE15E@admin56.narex.com> ...after setting up spambayes with Outlook XP (training, telling it to watch the Inbox and move spam), I decided the icons were too far to the right so I right-clicked on the toolbar, dragged them down to the next line, and then Outlook froze. In particular the customize dialog was unresponsive (although I could still move icons around on the toolbar). I was forced to shut Outlook down through taskmanager... Now whenever I start Outlook, I see the message below from win32traceutil.py, and the taskbar icons are non-functional. I couldn't find any clues on how to uninstall and try again (I did run addin.py --unregister, and it said it was successful, but still the same message after running addin.py again). Any help or pointers to documentation would be greatly appreciated. -- bjorn Outlook Spam Addin module loading SpamAddin - Connecting to Outlook Either bayes database or message database is missing - creating new Bayes database initialized with 0 spam and 0 good messages pythoncom error: Python error invoking COM method. Traceback (most recent call last): File "C:\Python22\\lib\site-packages\win32com\server\policy.py", line 275, in _Invoke_ return self._invoke_(dispid, lcid, wFlags, args) File "C:\Python22\\lib\site-packages\win32com\server\policy.py", line 280, in _invoke_ return S_OK, -1, self._invokeex_(dispid, lcid, wFlags, args, None, None) File "C:\Python22\\lib\site-packages\win32com\server\policy.py", line 562, in _invokeex_ return DesignatedWrapPolicy._invokeex_( self, dispid, lcid, wFlags, args, kwArgs, serv iceProvider) File "C:\Python22\\lib\site-packages\win32com\server\policy.py", line 510, in _invokeex_ return apply(func, args) File "D:\Transfer\spambayes-1.0a1\Outlook2000\addin.py", line 511, in OnSelectionChange self.SetupUI() File "D:\Transfer\spambayes-1.0a1\Outlook2000\addin.py", line 435, in SetupUI Tag = "SpamBayes.Manager") File "D:\Transfer\spambayes-1.0a1\Outlook2000\addin.py", line 470, in _AddControl item = parent.Controls.Add(Type=control_type, Temporary=True) File "C:\Python22\\lib\site-packages\win32com\client\__init__.py", line 369, in __getatt r__ return apply(self._ApplyTypes_, args) File "C:\Python22\\lib\site-packages\win32com\client\__init__.py", line 363, in _ApplyTy pes_ return self._get_good_object_(apply(self._oleobj_.InvokeTypes, (dispid, 0, wFlags, ret Type, argTypes) + args), user, resultCLSID) pywintypes.com_error: (-2147352567, 'Exception occurred.', (0, None, None, None, 0, -21474 67259), None) From mhammond at skippinet.com.au Thu Jan 23 13:28:41 2003 From: mhammond at skippinet.com.au (Mark Hammond) Date: Wed Jan 22 21:29:05 2003 Subject: [Spambayes] I did something stupid... In-Reply-To: <60FB8BB7F0EFC7409B75EEEC13E2019201BFE15E@admin56.narex.com> Message-ID: <01ac01c2c287$25c05390$530f8490@eden> > ...after setting up spambayes with Outlook XP (training, telling it to > watch the Inbox and move spam), I decided the icons were too > far to the > right so I right-clicked on the toolbar, dragged them down to the next > line, and then Outlook froze. In particular the customize dialog was > unresponsive (although I could still move icons around on the I will try to repro this, but am busy for the next few days. You may like to try the customize dialog, and hitting "Reset" on the toolbars. Otherwise, for the time being, wrap an exception handler around: > File "D:\Transfer\spambayes-1.0a1\Outlook2000\addin.py", > line 470, in > _AddControl > item = parent.Controls.Add(Type=control_type, Temporary=True) And just ignore it for now. Mark. -------------- next part -------------- A non-text attachment was scrubbed... Name: winmail.dat Type: application/ms-tnef Size: 2683 bytes Desc: not available Url : http://mail.python.org/pipermail/spambayes/attachments/20030123/ed140805/winmail.bin From anthony at interlink.com.au Thu Jan 23 13:32:10 2003 From: anthony at interlink.com.au (Anthony Baxter) Date: Wed Jan 22 21:33:52 2003 Subject: [Spambayes] Congratulations In-Reply-To: <4nnt2vcd05qgg9m6ldlea194pnc6doo05f@4ax.com> Message-ID: <200301230232.h0N2WAY01294@localhost.localdomain> >>> Richie Hindle wrote > > [Fran?ois] > > By the way, I think that the released version should ship with a minimal > > bayescustomize.ini file loaded for pop3proxy use with a fake server. People > > will have an easier time to replace this with their real server name. > > This is a good idea, but the obvious defaults are different for different > platforms, which makes things difficult. Which reminds me - I'd like to make it so bayescustomize.ini can be found in a couple of places other than the current directory, or the env var. For instance, on Unix, $HOME/.spambayes/bayescustomize.ini I'm not sure where on Windows or MacOS. Suggestions? This removes that whole "strange behaviour if you're in the wrong place" thing... Anthony -- Anthony Baxter It's never too late to have a happy childhood. From neale at woozle.org Wed Jan 22 20:43:51 2003 From: neale at woozle.org (Neale Pickett) Date: Wed Jan 22 23:44:00 2003 Subject: [Spambayes] packaging question In-Reply-To: <15918.52205.545824.761182@montanaro.dyndns.org> (Skip Montanaro's message of "Wed, 22 Jan 2003 10:50:53 -0600") References: <15918.52205.545824.761182@montanaro.dyndns.org> Message-ID: Skip Montanaro writes: > No, but I'd prefer it if you coaxed the SF folks into renaming it so > CVS history info is retained. Since the only history on any of the files in that directory is the initial checkin message, I'm just going to rename. Skip, you have permission to throttle me if this hacks you off ;) Neale From neale at woozle.org Wed Jan 22 20:48:09 2003 From: neale at woozle.org (Neale Pickett) Date: Wed Jan 22 23:48:16 2003 Subject: [Spambayes] Congratulations In-Reply-To: <200301230232.h0N2WAY01294@localhost.localdomain> (Anthony Baxter's message of "Thu, 23 Jan 2003 13:32:10 +1100") References: <200301230232.h0N2WAY01294@localhost.localdomain> Message-ID: Anthony Baxter writes: > Which reminds me - I'd like to make it so bayescustomize.ini can be > found in a couple of places other than the current directory, or the > env var. I've already checked something like this in to Options.py. Great minds think alike! And so do ours! ;-) Neale From neale at woozle.org Wed Jan 22 22:03:55 2003 From: neale at woozle.org (Neale Pickett) Date: Thu Jan 23 01:03:58 2003 Subject: [Spambayes] can I change the options to hammie.py and hammiebulk.py? Message-ID: I know a lot of you out there in hammieland are still using hammie.py instead of hammiebulk.py, and that's okay. But I think it's time I reined everyone in here, so I'm proposing some changes. I want to: 1. Do away with hammie.py in the top directory. The hammie.py module would still exist, you'd just have to call hammiebulk.py directly. 2. Move hammiebulk.py into the top directory. 3. Totally rearrange the options to hammiebulk. Specifically, I want to make it more like hammiefilter. To wit: """Usage: %(program)s [OPTION]... [OPTION] is one of: -h show usage and exit -x show some usage examples and exit -d DBFILE use database in DBFILE -D PICKLEFILE use pickle (instead of database) in PICKLEFILE -n create a new database * -f filter (default if no processing options are given) * -t [EXPERIMENTAL] filter and train based on the result (you must make sure to untrain all mistakes later) * -g [EXPERIMENTAL] (re)train as a good (ham) message * -s [EXPERIMENTAL] (re)train as a bad (spam) message * -G [EXPERIMENTAL] untrain ham (only use if you've already trained this message) * -S [EXPERIMENTAL] untrain spam (only use if you've already trained this message) """ I'd provide a (-F, --force-train) option to force training even if a trained header is found, and a (-N, --no-trained-header) option to prevent writing out trained headers. 4. Remove mboxtrain.py, as hammiebulk.py would replace it. Glutton for punishment, Neale From neale at woozle.org Wed Jan 22 22:12:13 2003 From: neale at woozle.org (Neale Pickett) Date: Thu Jan 23 01:12:16 2003 Subject: [Spambayes] degeneration In-Reply-To: <20030122061645.GA6147@upl.cs.wisc.edu> (Adam Hupp's message of "Wed, 22 Jan 2003 00:16:45 -0600") References: <20030121212741.GA3849@upl.cs.wisc.edu> <20030122061645.GA6147@upl.cs.wisc.edu> Message-ID: Adam Hupp writes: > It looks like your hammiefilter uses almost the same interface that > mine does, so the integration should be a snap. Killer! So with the arguments on hammiefilter.py that I just checked in, your rules would look like this: > folder-hook . "macro index S '|hammiefilter.py -s\n =caughtspam\n'" > folder-hook . "macro pager S '|hammiefilter.py -s\n =caughtspam\n'" > folder-hook . "macro index H '|hammiefilter.py -g\r !\r'" > folder-hook . "macro pager H '|hammiefilter.py -g\r !\r'" > color index red black "~h 'X-Hammie-Disposition: spam' ~F" And then you run all your mail through "hammiefilter.py -t" from procmail. Does that look good to you? Perhaps there should also be a "delete as spam" button. Or maybe that's what S should do, since folks probably aren't going to want to keep spam around. Neale From sjoerd at acm.org Thu Jan 23 10:14:45 2003 From: sjoerd at acm.org (Sjoerd Mullender) Date: Thu Jan 23 04:14:52 2003 Subject: [Spambayes] packaging question In-Reply-To: <15918.52205.545824.761182@montanaro.dyndns.org> References: <15918.52205.545824.761182@montanaro.dyndns.org> Message-ID: <20030123091445.DC33674D14@indus.ins.cwi.nl> On Wed, Jan 22 2003 Skip Montanaro wrote: > > Neale> I'd like to rename the hammie/ directory to contrib/, and put my > Neale> spambayes.el (as well as an example muttrc) in there. > > Neale> Any objections? > > No, but I'd prefer it if you coaxed the SF folks into renaming it so CVS > history info is retained. When you do that everybody who updates their repository will get error messages from CVS and the old hammie directory will not have been removed from the checked-out copy. -- Sjoerd Mullender From francois.granger at laposte.net Thu Jan 23 10:32:45 2003 From: francois.granger at laposte.net (Fran=?ISO-8859-1?B?5w==?=ois Granger) Date: Thu Jan 23 06:43:40 2003 Subject: [Spambayes] Congratulations In-Reply-To: <200301230232.h0N2WAY01294@localhost.localdomain> Message-ID: on 23/01/03 3:32, Anthony Baxter at anthony@interlink.com.au wrote: > For instance, on Unix, $HOME/.spambayes/bayescustomize.ini > I'm not sure where on Windows or MacOS. Suggestions? MacOS 9: there is no rule apart the "preference" folder, but I don't think it is a good idea. With no explicit path, the file will be launched from the script os.getcwd() folder. With a path the script will launch it from anywhere but path notation on MacOS 9 are really strange for Unix users. MacOS X: Same as Unix mainly, but can work like MacOS 9. From noreply at sourceforge.net Wed Jan 22 20:57:43 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Thu Jan 23 06:43:48 2003 Subject: [Spambayes] [ spambayes-Bugs-650496 ] hammie.py discards headers Message-ID: Bugs item #650496, was opened at 2002-12-08 10:39 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=650496&group_id=61702 Category: None Group: None >Status: Closed >Resolution: Works For Me Priority: 5 Submitted By: Simon Baatz (bnomis26) Assigned to: Neale Pickett (npickett) Summary: hammie.py discards headers Initial Comment: When feeding the (malformed) attached mail to hammie.py in filter mode, the headers of the mail are not present in the output. Command line: python hammie.py -f -d -p ~/mail/hammie.db < msg.lAoM Output: X-Spambayes-Classification: ham; 0.00 --Amazon.com_multipart_boundary____________ Content-Type: text/plain; charset=iso-8859-1 Vielen Dank für Ihre Bestellung bei Amazon.de. --Amazon.com_multipart_boundary____________ Content-Type: text/html; charset=iso-8859-1 --Amazon.com_multipart_boundary____________-- ---------------------------------------------------------------------- >Comment By: Neale Pickett (npickett) Date: 2003-01-22 20:57 Message: Logged In: YES user_id=619391 Seems to be okay now... ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=650496&group_id=61702 From noreply at sourceforge.net Wed Jan 22 21:01:48 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Thu Jan 23 06:43:54 2003 Subject: [Spambayes] [ spambayes-Patches-639122 ] hammie: ignore emails older than n days Message-ID: Patches item #639122, was opened at 2002-11-15 13:47 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498105&aid=639122&group_id=61702 Category: None Group: None Status: Open >Resolution: Later Priority: 5 Submitted By: Jason Hildebrand (jdhildeb) Assigned to: Neale Pickett (npickett) Summary: hammie: ignore emails older than n days Initial Comment: Since your documentation stresses the importance of training using only relatively recent emails, I thought a good way to do this would be to have hammie do it for me. So I added a new configuration option: [Hammie] # when training, hammie will ignore messages older than this number of days. # i.e. set to 365 to ignore messages older than one year. # Set to 0 to disable any filtering by date. ignore_old_messages: 0 The patch also modifies Hammie to output the number of messages it read/ignored for each mail file it processes. This option might also prove useful for doing incremental training (i.e. set up cron to train once a week, and set ignore_old_messages to 7). ---------------------------------------------------------------------- >Comment By: Neale Pickett (npickett) Date: 2003-01-22 21:01 Message: Logged In: YES user_id=619391 Jason, does the current mboxtrain.py script do enough of this functionality for you, or would you still like to see us work by the Recieved header? I suspect it might be good enough... ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498105&aid=639122&group_id=61702 From mhammond at skippinet.com.au Thu Jan 23 23:25:10 2003 From: mhammond at skippinet.com.au (Mark Hammond) Date: Thu Jan 23 07:26:06 2003 Subject: [Spambayes] Outlook: new folder selector code Message-ID: <000d01c2c2da$7a72fb10$530f8490@eden> I have just checked in some significant changes to the "folder selector" dialog - the cute dialog that presents the list of folders in various parts of the UI. Inspired by patches from Tony Meyer, everything should look the same, but behind the scenes there are 2 major changes that will be of advantage to Exchange Server users: * We are back to an extended MAPI (ie, fast) version of the code. We believe we have identified and fixed the problem that prevented this code from working with an Exchange Server before. * The folder hierarchy is no longer walked fully before the dialog is created. The folder hierarchy is only walked as a node is expanded. This should make the dialog come up much faster if you have a large "public folders" hierarchy - these folders are not walked until you actually expand them (and even then, only walked one level down at that time) So, in summary, I hope that non Exchange Server users see slight performance gain displaying this dialog, and Exchange Server users see a significant one. Please let me know if you have any problems. Mark. From jchilders at smartbusinessware.com Thu Jan 23 09:18:40 2003 From: jchilders at smartbusinessware.com (Jeff Childers) Date: Thu Jan 23 09:39:39 2003 Subject: [Spambayes] Outlook 2000 Install not working - "No module named spambayes" Message-ID: <9C067B997594D3118BF50050DA24EA1097D5E4@CCG2> Hi all, I've installed the SpamBayes files and run addin.py. Seems to register OK (at least the trace comments suggest it is so). Then, when loading Outlook, SB fails to load and the trace collector shows the following error: >>> Outlook Spam Addin module loading >>> SpamAddin - Connecting to Outlook >>> Created new configuration file 'C:\Python22\Lib\site-packages\SpamBayes\default_configuration.pck' >>> Traceback (most recent call last): >>> File "C:\Python22\\lib\site-packages\win32com\universal.py", line 150, in dispatch >>> retVal = ob._InvokeEx_(meth.dispid, 0, pythoncom.DISPATCH_METHOD, args, None, None) >>> File "C:\Python22\\lib\site-packages\win32com\server\policy.py", line 322, in _InvokeEx_ >>> return self._invokeex_(dispid, lcid, wFlags, args, kwargs, serviceProvider) >>> File "C:\Python22\\lib\site-packages\win32com\server\policy.py", line 562, in _invokeex_ >>> return DesignatedWrapPolicy._invokeex_( self, dispid, lcid, wFlags, args, kwArgs, serviceProvider) >>> File "C:\Python22\\lib\site-packages\win32com\server\policy.py", line 510, in _invokeex_ >>> return apply(func, args) >>> File "D:\Data\Apps\Python\SpamBayes\addin.py", line 594, in OnConnection >>> self.manager = manager.GetManager(application) >>> File "D:\Data\Apps\Python\SpamBayes\manager.py", line 335, in GetManager >>> _mgr = BayesManager(outlook=outlook, verbose=verbose) >>> File "D:\Data\Apps\Python\SpamBayes\manager.py", line 79, in __init__ >>> import_core_spambayes_stuff(self.ini_filename) >>> File "D:\Data\Apps\Python\SpamBayes\manager.py", line 52, in import_core_spambayes_stuff >>> from spambayes import classifier >>> exceptions.ImportError: No module named spambayes I originally installed the SpamBayes folder on [D:\Data\Apps\Python\SpamBayes]. After the load failed the first time, I copied the SB folder to [C:\Python22\Lib\Site-Packages\SpamBayes]. I then re-ran addin.py from the new folder on C, same result when starting Outlook. Curiously, the error output above still refers to the old location on D. I can find no configuration file that contains this information. Why is it still looking at D, or more importantly, how can I reset the configuration? Finally, what have I done wrong and how do I correct it to get SB working under Outlook? OS: WinXP Outlook 2000 Win32Comall-150 Thanks for any help. JC From anthony at interlink.com.au Fri Jan 24 01:46:45 2003 From: anthony at interlink.com.au (Anthony Baxter) Date: Thu Jan 23 09:48:21 2003 Subject: [Spambayes] Outlook 2000 Install not working - "No module named spambayes" In-Reply-To: <9C067B997594D3118BF50050DA24EA1097D5E4@CCG2> Message-ID: <200301231446.h0NEkj810300@localhost.localdomain> >>> Jeff Childers wrote > Hi all, > > I've installed the SpamBayes files and run addin.py. Seems to register OK > (at least the trace comments suggest it is so). Then, when loading Outlook, > SB fails to load and the trace collector shows the following error: When you say you "installed the SpamBayes files", what do you mean? Did you run "setup.py install"? It looks like (from your traceback) that the spambayes module didn't get installed. I could imagine that a failed run of addin.py would leave things in a... not good... state. Anthony -- Anthony Baxter It's never too late to have a happy childhood. From jchilders_98 at yahoo.com Thu Jan 23 08:44:19 2003 From: jchilders_98 at yahoo.com (J. Childers) Date: Thu Jan 23 11:44:23 2003 Subject: [Spambayes] [Outlook2000 Install Problem] Ok, got it :) Message-ID: <20030123164419.83201.qmail@web13902.mail.yahoo.com> Ahh, I had to do -two- installs: first SpamBayes *then* the addin.py. For some reason I thought the Outlook2000 piece included SB. Now I have good messages in the trace and some new options in OL2K. Thanks Anthony! Regards, JC __________________________________________________ Do you Yahoo!? Yahoo! Mail Plus - Powerful. Affordable. Sign up now. http://mailplus.yahoo.com From noreply at sourceforge.net Thu Jan 23 10:42:57 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Thu Jan 23 13:53:43 2003 Subject: [Spambayes] [ spambayes-Patches-639122 ] hammie: ignore emails older than n days Message-ID: Patches item #639122, was opened at 2002-11-15 13:47 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498105&aid=639122&group_id=61702 Category: None Group: None Status: Open Resolution: Later Priority: 5 Submitted By: Jason Hildebrand (jdhildeb) Assigned to: Neale Pickett (npickett) Summary: hammie: ignore emails older than n days Initial Comment: Since your documentation stresses the importance of training using only relatively recent emails, I thought a good way to do this would be to have hammie do it for me. So I added a new configuration option: [Hammie] # when training, hammie will ignore messages older than this number of days. # i.e. set to 365 to ignore messages older than one year. # Set to 0 to disable any filtering by date. ignore_old_messages: 0 The patch also modifies Hammie to output the number of messages it read/ignored for each mail file it processes. This option might also prove useful for doing incremental training (i.e. set up cron to train once a week, and set ignore_old_messages to 7). ---------------------------------------------------------------------- >Comment By: T. Alexander Popiel (popiel) Date: 2003-01-23 10:42 Message: Logged In: YES user_id=632302 Parsing the topmost received header for the date is a very valuable tool for maintaining limited database size. It's a key feature of my bulkgraph.py script (over and above dealing with my non-standard everything vs. spam folders). Count this as another vote to include such filtering... even though my peculiar folder setup precludes me from using mboxtrain. ---------------------------------------------------------------------- Comment By: Neale Pickett (npickett) Date: 2003-01-22 21:01 Message: Logged In: YES user_id=619391 Jason, does the current mboxtrain.py script do enough of this functionality for you, or would you still like to see us work by the Recieved header? I suspect it might be good enough... ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498105&aid=639122&group_id=61702 From BPettersen at NAREX.com Thu Jan 23 14:13:04 2003 From: BPettersen at NAREX.com (Bjorn Pettersen) Date: Thu Jan 23 16:28:30 2003 Subject: [Spambayes] I did something stupid... Message-ID: <60FB8BB7F0EFC7409B75EEEC13E2019201BFE19B@admin56.narex.com> > From: Mark Hammond [mailto:mhammond@skippinet.com.au] > > > ...after setting up spambayes with Outlook XP (training, > > telling it to watch the Inbox and move spam), I decided > > the icons were too far to the right so I right-clicked > > on the toolbar, dragged them down to the next line, and > > then Outlook froze. In particular the customize dialog > > was unresponsive (although I could still move icons > > around on the > > I will try to repro this, but am busy for the next few days. Thanks! (I hope I didn't imply that this was urgent, I'm fully aware of what I'm doing when I'm using pre-release software and I'm very grateful for any time you want to spend on someone who's not contributing :-) > You may like to try the customize dialog, and hitting "Reset" > on the toolbars. Doing that, and re-running addin.py got me back up and running. When I right clicked on the toolbar again (I just couldn't help myself ), the dialog box was again frozen, but this time I could close it with Alt+F4. After I shut down and re-started Outlook, it looked to be working. > Otherwise, for the time being, wrap an exception handler around: > > > File "D:\Transfer\spambayes-1.0a1\Outlook2000\addin.py", > > line 470, in > > _AddControl > > item = parent.Controls.Add(Type=control_type, Temporary=True) > > And just ignore it for now. I will try that if it fails again... Looking a little closer at addin.py, it looks like I was doing something you were trying to prevent (461-464): [...temporary Toolbars...] # Maybe we should consider making them permanent - this would then # allow the user to drag them around the toolbars and have them # stick. The downside is that should the user uninstall this addin # there is no clean way to remove the buttons. Do we even care? I would obviously not care . Also, at 395, the name of the Toolbar is named explicitly: toolbar = bars.Item("Standard") whereas the second line of toolbar buttons is normally called "Advanced", and the user could obviously have created custom toolbars named anything they choose (not sure if this is relevant...) -- bjorn From noreply at sourceforge.net Thu Jan 23 14:02:32 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Thu Jan 23 17:04:45 2003 Subject: [Spambayes] [ spambayes-Bugs-673388 ] pop3proxy storage Message-ID: Bugs item #673388, was opened at 2003-01-23 23:02 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=673388&group_id=61702 Category: None Group: None Status: Open Resolution: None Priority: 5 Submitted By: François Granger (fgranger) Assigned to: Nobody/Anonymous (nobody) Summary: pop3proxy storage Initial Comment: I had a look in the pop3proxy folders, and I found thes strange files. They miss header and maybe part of the message. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=673388&group_id=61702 From noreply at sourceforge.net Thu Jan 23 14:04:35 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Thu Jan 23 17:04:52 2003 Subject: [Spambayes] [ spambayes-Bugs-673390 ] pop3proxy storage 2nd file Message-ID: Bugs item #673390, was opened at 2003-01-23 23:04 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=673390&group_id=61702 Category: None Group: None Status: Open Resolution: None Priority: 5 Submitted By: François Granger (fgranger) Assigned to: Nobody/Anonymous (nobody) Summary: pop3proxy storage 2nd file Initial Comment: Other file missing header ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=673390&group_id=61702 From francois.granger at free.fr Thu Jan 23 23:05:00 2003 From: francois.granger at free.fr (=?iso-8859-1?Q?Fran=E7ois?= Granger) Date: Thu Jan 23 17:06:27 2003 Subject: [Spambayes] Strange issue storage with pop3proxy Message-ID: Sorry, I submitted it through sourceforge, but the file upload did not worked. I got partial messages in files missing at least header and maybe part of the message. I enclose two messages here. -- Recently using MacOSX.......-------------- next part -------------- Skipped content of type multipart/appledouble-------------- next part -------------- Skipped content of type multipart/appledouble From T.A.Meyer at massey.ac.nz Fri Jan 24 15:23:17 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Thu Jan 23 21:24:04 2003 Subject: [Spambayes] Outlook: new folder selector code Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1318D387@its-xchg4.massey.ac.nz> [Mark] > I have just checked in some significant changes to the > "folder selector" dialog > Please let me know if you have any problems. Sadly, I do :) There was more than one, but mostly it was when using from a fresh install. The defaults provided were entryid only, not (storeid, entryid). There were also some 'None' entries, which caused problems. I'll email Mark with my fixed versions and leave it to him to check them in. Nice and fast though :p Cheers, Tony From T.A.Meyer at massey.ac.nz Fri Jan 24 15:35:37 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Thu Jan 23 21:36:16 2003 Subject: [Spambayes] I did something stupid... Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1318D389@its-xchg4.massey.ac.nz> > Looking a little closer at addin.py, it looks like I was > doing something > you were trying to prevent (461-464): > > [...temporary Toolbars...] > # Maybe we should consider making them permanent - this would then > # allow the user to drag them around the toolbars and have them > # stick. The downside is that should the user uninstall this addin > # there is no clean way to remove the buttons. Do we even care? Mark, could the unregister code not delete any permanent buttons? Along the same lines, why does the addin not show up in the COM add-ins list in the Outlook prefs? (Tools->Options->Other->Advanced->COM Add-ins) - is this because it's a python scripts and not an .exe or .dll? Is this just a packaging issue? > Also, at 395, the name of the Toolbar is named explicitly: > > toolbar = bars.Item("Standard") > > whereas the second line of toolbar buttons is normally called > "Advanced", and the user could obviously have created custom toolbars > named anything they choose (not sure if this is relevant...) I *think*, from looking at addin.py that this would still be needed even if they became permanent - they would default to the "Standard" toolbar (they have to start somewhere!). Cheers, Tony From noreply at sourceforge.net Thu Jan 23 17:59:23 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Thu Jan 23 21:39:47 2003 Subject: [Spambayes] [ spambayes-Patches-673754 ] Outlook exception when starting not in the inbox Message-ID: Patches item #673754, was opened at 2003-01-24 14:59 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498105&aid=673754&group_id=61702 Category: Outlook Group: None Status: Open Resolution: None Priority: 5 Submitted By: Tony Meyer (anadelonbrin) Assigned to: Nobody/Anonymous (nobody) Summary: Outlook exception when starting not in the inbox Initial Comment: If Outlook is started not in the inbox (any mail folder?) - in Outlook Today, for example - an exception is caused when you first switch to a mail folder. The exception doesn't seem to cause any errors, but better safe than sorry :) Here's the trace: pythoncom error: Python error invoking COM method. Traceback (most recent call last): File "D:\Python22\lib\site- packages\win32com\server\policy.py", line 275, in _Invoke_ return self._invoke_(dispid, lcid, wFlags, args) File "D:\Python22\lib\site- packages\win32com\server\policy.py", line 280, in _invoke_ return S_OK, -1, self._invokeex_(dispid, lcid, wFlags, args, None, None) File "D:\Python22\lib\site- packages\win32com\server\policy.py", line 562, in _invokeex_ return DesignatedWrapPolicy._invokeex_( self, dispid, lcid, wFlags, args, kwArgs, serviceProvider) File "D:\Python22\lib\site- packages\win32com\server\policy.py", line 510, in _invokeex_ return apply(func, args) File "D:\CVS Modules\spambayes\Outlook2000 \addin.py", line 549, in OnFolderSwitch self.but_recover_as.Visible = show_recover_as File "D:\Python22\lib\site- packages\win32com\client\__init__.py", line 368, in __getattr__ raise AttributeError, "'%s' object has no attribute '%s'" % (repr(self), attr) exceptions.AttributeError: '' object has no attribute 'but_recover_as' ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498105&aid=673754&group_id=61702 From noreply at sourceforge.net Thu Jan 23 18:33:59 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Thu Jan 23 21:39:53 2003 Subject: [Spambayes] [ spambayes-Patches-673754 ] Outlook exception when starting not in the inbox Message-ID: Patches item #673754, was opened at 2003-01-24 14:59 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498105&aid=673754&group_id=61702 Category: Outlook Group: None Status: Open Resolution: None Priority: 5 Submitted By: Tony Meyer (anadelonbrin) Assigned to: Nobody/Anonymous (nobody) Summary: Outlook exception when starting not in the inbox Initial Comment: If Outlook is started not in the inbox (any mail folder?) - in Outlook Today, for example - an exception is caused when you first switch to a mail folder. The exception doesn't seem to cause any errors, but better safe than sorry :) Here's the trace: pythoncom error: Python error invoking COM method. Traceback (most recent call last): File "D:\Python22\lib\site- packages\win32com\server\policy.py", line 275, in _Invoke_ return self._invoke_(dispid, lcid, wFlags, args) File "D:\Python22\lib\site- packages\win32com\server\policy.py", line 280, in _invoke_ return S_OK, -1, self._invokeex_(dispid, lcid, wFlags, args, None, None) File "D:\Python22\lib\site- packages\win32com\server\policy.py", line 562, in _invokeex_ return DesignatedWrapPolicy._invokeex_( self, dispid, lcid, wFlags, args, kwArgs, serviceProvider) File "D:\Python22\lib\site- packages\win32com\server\policy.py", line 510, in _invokeex_ return apply(func, args) File "D:\CVS Modules\spambayes\Outlook2000 \addin.py", line 549, in OnFolderSwitch self.but_recover_as.Visible = show_recover_as File "D:\Python22\lib\site- packages\win32com\client\__init__.py", line 368, in __getattr__ raise AttributeError, "'%s' object has no attribute '%s'" % (repr(self), attr) exceptions.AttributeError: '' object has no attribute 'but_recover_as' ---------------------------------------------------------------------- >Comment By: Tony Meyer (anadelonbrin) Date: 2003-01-24 15:33 Message: Logged In: YES user_id=552329 Akk...Next time more testing before submitting a patch :) That didn't work well at all (same problem, though). I think that this will, though. There's a comment in addin.py about an Outlook bug with OnNewExplorer, but the code doesn't seem to do what the comments say. The first OnNewExplorer call is skipped (via the do_activate bool), so it's safe to have setup in onactivate and not onselection. Anyway, this works and fixes the problem on my system. I'll leave it to others to check theirs. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498105&aid=673754&group_id=61702 From jpstbelt at jla.vsnl.net.in Fri Jan 24 13:54:20 2003 From: jpstbelt at jla.vsnl.net.in (Jai Pall) Date: Fri Jan 24 03:35:56 2003 Subject: [Spambayes] re.post our mail to your list Message-ID: <000c01c2c381$ff64f880$09d941db@jps> Dear Sir, Pl. post our mail to your list. Thanks Jai Pall From francois.granger at free.fr Fri Jan 24 09:39:21 2003 From: francois.granger at free.fr (=?iso-8859-1?Q?Fran=E7ois?= Granger) Date: Fri Jan 24 03:39:29 2003 Subject: [Spambayes] Another example Message-ID: Inside the attached archive, there are two versions of the same mail. All mail with issues come from the same mailing list. The list software is Sympa: . All of these mail come throught the same pop server: pop.laposte.net. i'll try to get them through another pop server to see if it makes any differences. The current status of pop3proxy is as follow: ======================================== POP3 proxy running on 127.0.0.1:110, 127.0.0.2:110, 127.0.0.3:110, 127.0.0.4:110, proxying to pop.nerim.net:110, pop.free.fr:110, altern.org:110, pop.laposte.net:110. Active POP3 conversations: 0. POP3 conversations this session: 457. Emails classified this session: 31 spam, 1187 ham, 32 unsure. Total emails trained: Spam: 120 Ham: 77 ======================================== The terminal displayed: ======================================== [...] adding message 1043357721 to corpus placing 1043357721 in corpus cache adding 1043361256 to corpus storing 1043361256 adding message 1043361256 to corpus placing 1043361256 in corpus cache adding 1043361257 to corpus storing 1043361257 adding message 1043361257 to corpus placing 1043361257 in corpus cache adding 1043361262 to corpus storing 1043361262 [...] ======================================== The "Eudora.txt" one is a copy and past from the Eudora mbox file. The "1043361257" is the file that pop3proxy stored in his folder before I review it. There is really something wrong here. I guess that the email module has still some issues with the Microsoft XML format ? .... And file storage should be done with the raw data ? -- Recently using MacOSX.......-------------- next part -------------- Skipped content of type multipart/appledouble From mhammond at skippinet.com.au Sat Jan 25 00:59:12 2003 From: mhammond at skippinet.com.au (Mark Hammond) Date: Fri Jan 24 09:00:05 2003 Subject: [Spambayes] I did something stupid... In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F1318D389@its-xchg4.massey.ac.nz> Message-ID: <005901c2c3b0$c70529e0$530f8490@eden> [Tony] > Mark, could the unregister code not delete any permanent > buttons? It could, but it would need to start outlook to do so. I guess that isn't so bad when I type it here (as opposed to just musing over it!) > Along the same lines, why does the addin not show > up in the COM add-ins list in the Outlook prefs? > (Tools->Options->Other->Advanced->COM Add-ins) - is this > because it's a python scripts and not an .exe or .dll? Is > this just a packaging issue? Actually, NFI about this. The docs even say you can (so we have a bug! ) > > Also, at 395, the name of the Toolbar is named explicitly: > > > > toolbar = bars.Item("Standard") > > > > whereas the second line of toolbar buttons is normally called > > "Advanced", and the user could obviously have created > > custom toolbars named anything they choose (not sure if this > > is relevant...) It may be, but the code tries to handle this. Note that _AddControl says: item = self.CommandBars.FindControl( Type = control_type, Tag = item_attrs['Tag']) So we actually search all command bars for the specified tag. The tag is assumed unique. If this command returns None, then the toolbar passed (ie, "Standard") is where the items are to be added. At least this is the intent . I've added comments to this affect. > I *think*, from looking at addin.py that this would still be > needed even if they became permanent - they would default to > the "Standard" toolbar (they have to start somewhere!). Yeah, we do need to do something. I was kinda hoping that the worst that would happen is a couple of dead buttons should the user be so brain-dead they choose to uninstall our product . Add 2 bugs and assign them to me - one for the doc/code for the plugin, and the other that we leave dead buttons on uninstall. Mark. From noreply at sourceforge.net Fri Jan 24 01:20:51 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Fri Jan 24 09:51:44 2003 Subject: [Spambayes] [ spambayes-Bugs-673892 ] Missing compat with 22 code Message-ID: Bugs item #673892, was opened at 2003-01-24 10:20 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=673892&group_id=61702 Category: None Group: None Status: Open Resolution: None Priority: 5 Submitted By: François Granger (fgranger) Assigned to: Nobody/Anonymous (nobody) Summary: Missing compat with 22 code Initial Comment: MacOS X 10.2.3 built in python 2.2 In pop3proxy Web interface, I click on the Config link. Traceback (most recent call last): File "/Volumes/OS99/spambayes/spambayes/Dibbler.py", line 398, in found_terminator getattr(plugin, name)(**params) File "/Volumes/OS99/spambayes/OptionConfig.py", line 219, in onConfig isFirstRow = True NameError: global name 'True' is not defined Adding line 44-48 of pop3proxy.py at line 33 of OptionConfig.py solve the problem. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=673892&group_id=61702 From noreply at sourceforge.net Fri Jan 24 09:13:07 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Fri Jan 24 12:16:23 2003 Subject: [Spambayes] [ spambayes-Bugs-673892 ] Missing compat with 22 code Message-ID: Bugs item #673892, was opened at 2003-01-24 09:20 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=673892&group_id=61702 Category: None Group: None Status: Open Resolution: None Priority: 5 Submitted By: François Granger (fgranger) >Assigned to: Richie Hindle (richiehindle) Summary: Missing compat with 22 code Initial Comment: MacOS X 10.2.3 built in python 2.2 In pop3proxy Web interface, I click on the Config link. Traceback (most recent call last): File "/Volumes/OS99/spambayes/spambayes/Dibbler.py", line 398, in found_terminator getattr(plugin, name)(**params) File "/Volumes/OS99/spambayes/OptionConfig.py", line 219, in onConfig isFirstRow = True NameError: global name 'True' is not defined Adding line 44-48 of pop3proxy.py at line 33 of OptionConfig.py solve the problem. ---------------------------------------------------------------------- >Comment By: Richie Hindle (richiehindle) Date: 2003-01-24 17:13 Message: Logged In: YES user_id=85414 I'll sort this out - thanks, François. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=673892&group_id=61702 From neale at woozle.org Fri Jan 24 10:37:34 2003 From: neale at woozle.org (Neale Pickett) Date: Fri Jan 24 13:38:32 2003 Subject: [Spambayes] re.post our mail to your list In-Reply-To: <000c01c2c381$ff64f880$09d941db@jps> ("Jai Pall"'s message of "Fri, 24 Jan 2003 13:54:20 +0530") References: <000c01c2c381$ff64f880$09d941db@jps> Message-ID: "Jai Pall" writes: > Dear Sir, > > Pl. post our mail to your list. Looks like it got posted :) Neale From db3l at fitlinxx.com Fri Jan 24 14:51:52 2003 From: db3l at fitlinxx.com (David Bolen) Date: Fri Jan 24 14:52:14 2003 Subject: [Spambayes] Re: Outlook: new folder selector code References: <1ED4ECF91CDED24C8D012BCF2B034F1318D387@its-xchg4.massey.ac.nz> Message-ID: "Meyer, Tony" writes: > Sadly, I do :) There was more than one, but mostly it was when using > from a fresh install. The defaults provided were entryid only, not > (storeid, entryid). There were also some 'None' entries, which > caused problems. > > I'll email Mark with my fixed versions and leave it to him to check them in. I'm also having problems with an exchange server after updating to the latest from CVS - using my existing configuration it failed when trying to establish the filtering hooks. But even from a fresh install and empty database, any attempt to work with folders generates an exception - for me it's in NormalizeID in msgstore.py: "AssertionError: We expect fully qualified IDs" Given Mark's comment about being tied up currently, if you were amenable to sending me a copy of your patches, I could also give them a shot on my system. -- David From jh at web.de Fri Jan 24 21:39:18 2003 From: jh at web.de (Juergen Hermann) Date: Fri Jan 24 15:38:58 2003 Subject: [Spambayes] Training unsure msgs only Message-ID: Hi! When you have trained a certain amount of msgs, it's enough to only train the unsure msgs, Thus, what do you tink of a "review unsure only" link in the pop3proxy (defaulting the two other categories to Discard). Another possibilty would be to have 3 more buttons (train spam / ham / unsure only). Ciao, J?rgen From richie at entrian.com Fri Jan 24 20:56:55 2003 From: richie at entrian.com (Richie Hindle) Date: Fri Jan 24 15:57:36 2003 Subject: [Spambayes] Training unsure msgs only In-Reply-To: References: Message-ID: Hi Juergen, > When you have trained a certain amount of msgs, it's enough to only > train the unsure msgs, Thus, what do you tink of a "review unsure > only" link in the pop3proxy (defaulting the two other categories to > Discard). > > Another possibilty would be to have 3 more buttons (train spam / ham / > unsure only). You can click the 'Discard' headers above the ham and spam lists to set all those messages to Discard. It's two extra clicks, but IMHO that's better than an extra piece of user interface - I don't want to make it too busy. -- Richie Hindle richie@entrian.com From richie at entrian.com Fri Jan 24 21:14:33 2003 From: richie at entrian.com (Richie Hindle) Date: Fri Jan 24 16:15:06 2003 Subject: [Spambayes] Training unsure msgs only In-Reply-To: References: Message-ID: > Maybe they're described in the manual I did not care to read. ;) They're in the one I did not care to write (yet?) -- Richie Hindle richie@entrian.com From richie at entrian.com Fri Jan 24 21:21:20 2003 From: richie at entrian.com (Richie Hindle) Date: Fri Jan 24 16:21:56 2003 Subject: [Spambayes] Re: Another example In-Reply-To: References: Message-ID: Hi Fran?ois, > Inside the attached archive, there are two versions of the same mail. Bizarre. It's as though it's coming in halfway through a message, and deciding that the message body up to the first CRNLCRNL is the headers. This: > adding message 1043357721 to corpus implies that you're running with "verbose: True", which must mean you have a _pop3proxy.log - next time you recieve such a broken email, could you email me your _pop3proxy.log? (It has your password in so you might want to edit that out, but if you do, could you try to use a binary editor that won't change any line-ending characters?) Could you zip it up before sending, again to prevent anything messing with the line endings? > I guess that the email module > has still some issues with the Microsoft XML format ? .... > And file storage should be done with the raw data ? The proxy doesn't use the email module to add its headers, so that's not the problem. And the storage *is* done with the raw data. > According to the text editor I used, they are > all CR+LF files, whatever the mail server or the source mailer app. > Look strange that pop3proxy store them that way on MacOS X ? It stores the messages exactly as they come over the wire from the POP3 server, and POP3 uses CRLF as the line ending regardless of the platform. They're stored on the disk in binary mode, because you never know whether there are unencoded binary characters in there. -- Richie Hindle richie@entrian.com From noreply at sourceforge.net Fri Jan 24 12:10:28 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Fri Jan 24 16:23:51 2003 Subject: [Spambayes] [ spambayes-Bugs-673892 ] Missing compat with 22 code Message-ID: Bugs item #673892, was opened at 2003-01-24 09:20 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=673892&group_id=61702 Category: None Group: None >Status: Closed >Resolution: Fixed Priority: 5 Submitted By: François Granger (fgranger) Assigned to: Richie Hindle (richiehindle) Summary: Missing compat with 22 code Initial Comment: MacOS X 10.2.3 built in python 2.2 In pop3proxy Web interface, I click on the Config link. Traceback (most recent call last): File "/Volumes/OS99/spambayes/spambayes/Dibbler.py", line 398, in found_terminator getattr(plugin, name)(**params) File "/Volumes/OS99/spambayes/OptionConfig.py", line 219, in onConfig isFirstRow = True NameError: global name 'True' is not defined Adding line 44-48 of pop3proxy.py at line 33 of OptionConfig.py solve the problem. ---------------------------------------------------------------------- Comment By: Richie Hindle (richiehindle) Date: 2003-01-24 17:13 Message: Logged In: YES user_id=85414 I'll sort this out - thanks, François. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=673892&group_id=61702 From noreply at sourceforge.net Fri Jan 24 13:02:48 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Fri Jan 24 16:23:57 2003 Subject: [Spambayes] [ spambayes-Bugs-672495 ] Files not installed by setup.py Message-ID: Bugs item #672495, was opened at 2003-01-22 15:33 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=672495&group_id=61702 Category: None Group: None >Status: Closed >Resolution: Accepted Priority: 5 Submitted By: Jürgen Hermann (jhermann) >Assigned to: Richie Hindle (richiehindle) Summary: Files not installed by setup.py Initial Comment: Patch: =================================================================== RCS file: /cvsroot/spambayes/spambayes/setup.py,v retrieving revision 1.13 diff -u -r1.13 setup.py --- setup.py 17 Jan 2003 06:45:36 -0000 1.13 +++ setup.py 22 Jan 2003 15:28:05 -0000 @@ -39,8 +39,12 @@ 'pop3proxy.py', 'proxytrainer.py', 'proxytee.py', + 'OptionConfig.py', ], - packages = [ 'spambayes', ], + packages = [ + 'spambayes', + 'spambayes.resources', + ], classifiers = [ 'Development Status :: 4 - Beta', 'Environment :: Console', ---------------------------------------------------------------------- >Comment By: Richie Hindle (richiehindle) Date: 2003-01-24 21:02 Message: Logged In: YES user_id=85414 spambayes.resources is now installed, and OptionConfig.py now lives in the spambayes package. Thanks, Jürgen. ---------------------------------------------------------------------- Comment By: Jürgen Hermann (jhermann) Date: 2003-01-22 20:22 Message: Logged In: YES user_id=39128 The current problem is the import in line 153 of pop3proxy: from OptionConfig import OptionsConfigurator Moving OptionConfig into the package is surely the best fix, including adapting the above import. ---------------------------------------------------------------------- Comment By: Richie Hindle (richiehindle) Date: 2003-01-22 18:43 Message: Logged In: YES user_id=85414 You're dead right about spambayes.resources, but I'm not convinced we should be installing OptionConfig.py now that it's been folded into the main pop3proxy web interface. I asked on the list whether anyone thought we should leave it in with the other scripts and got no replies. I'm tempted to move it into the spambayes package, from where pop3proxy.py can import it. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=672495&group_id=61702 From jh at web.de Fri Jan 24 22:05:08 2003 From: jh at web.de (Juergen Hermann) Date: Fri Jan 24 16:27:32 2003 Subject: [Spambayes] Training unsure msgs only In-Reply-To: Message-ID: On Fri, 24 Jan 2003 20:56:55 +0000, Richie Hindle wrote: >You can click the 'Discard' headers above the ham and spam lists to set all >those messages to Discard. It's two extra clicks, but IMHO that's better >than an extra piece of user interface - I don't want to make it too busy. Ugh, I did not note them until now. Maybe they're described in the manual I did not care to read. ;) Ciao, J?rgen From noreply at sourceforge.net Fri Jan 24 14:29:25 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Fri Jan 24 17:50:08 2003 Subject: [Spambayes] [ spambayes-Patches-639122 ] hammie: ignore emails older than n days Message-ID: Patches item #639122, was opened at 2002-11-15 15:47 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498105&aid=639122&group_id=61702 Category: None Group: None Status: Open Resolution: Later Priority: 5 Submitted By: Jason Hildebrand (jdhildeb) Assigned to: Neale Pickett (npickett) Summary: hammie: ignore emails older than n days Initial Comment: Since your documentation stresses the importance of training using only relatively recent emails, I thought a good way to do this would be to have hammie do it for me. So I added a new configuration option: [Hammie] # when training, hammie will ignore messages older than this number of days. # i.e. set to 365 to ignore messages older than one year. # Set to 0 to disable any filtering by date. ignore_old_messages: 0 The patch also modifies Hammie to output the number of messages it read/ignored for each mail file it processes. This option might also prove useful for doing incremental training (i.e. set up cron to train once a week, and set ignore_old_messages to 7). ---------------------------------------------------------------------- >Comment By: Jason Hildebrand (jdhildeb) Date: 2003-01-24 16:29 Message: Logged In: YES user_id=173690 Unfortunately, I haven't had time to update to a more recent spambayes; I'm still using a version from last november. Since this version is working well for me, I'm not terribly interested in messing with it -- since I know things have changed considerably in CVS since then. So I'm in a poor position to judge whether the functionality mboxtrain.py offers is "good enough" -- I'll have to leave it up to others to comment on. ---------------------------------------------------------------------- Comment By: T. Alexander Popiel (popiel) Date: 2003-01-23 12:42 Message: Logged In: YES user_id=632302 Parsing the topmost received header for the date is a very valuable tool for maintaining limited database size. It's a key feature of my bulkgraph.py script (over and above dealing with my non-standard everything vs. spam folders). Count this as another vote to include such filtering... even though my peculiar folder setup precludes me from using mboxtrain. ---------------------------------------------------------------------- Comment By: Neale Pickett (npickett) Date: 2003-01-22 23:01 Message: Logged In: YES user_id=619391 Jason, does the current mboxtrain.py script do enough of this functionality for you, or would you still like to see us work by the Recieved header? I suspect it might be good enough... ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498105&aid=639122&group_id=61702 From richie at entrian.com Sat Jan 25 00:02:30 2003 From: richie at entrian.com (Richie Hindle) Date: Fri Jan 24 19:03:04 2003 Subject: [Spambayes] Alpha 2 Release? Message-ID: I'd like to suggest we make another alpha release next week. Lots has happened since alpha 1 (some of which is important to the Linux Journal articles, which come out first thing in February): o Neale's work on plugging into Mutt and other mail clients. o Integration of Tim's web configuration page into pop3proxy.py, so you no longer need to know about bayescustomize.ini to use the POP3 proxy. o The ability to run multiple POP3 proxies on the same port. o The ability to limit connections to the web interface to localhost. o Various sundry improvements and bugfixes (including MacOS X support). Neale, do you think your Mutt edits will be ready by the middle of next week? I haven't tried them but it sounds like they're pretty much there? Here's what I think needs doing before another release: o I need to get to the bottom of Fran?ois' bizarre pop3proxy problems. o We need to document Neale's Mutt work (or just provide an example muttrc) o People need to test the up-to-date version! Skip, do you think the two of us can merge proxytrainer.py into pop3proxy.py before we make this release - hopefully the problems you were having with pop3proxy.py are cleared up now? It would be nice not to ship the duplicated code. I'm happy to do most if this if you're short of time - I've already incorporated a couple of your changes. Any other suggestions for what might need doing? Anthony, is there anything release-wise that needs to be done, or is it just a matter of running "setup.py sdist --formats zip,gztar" on a fresh checkout? release-early-and-release-often-ly yrs, -- Richie Hindle richie@entrian.com From jh at web.de Sat Jan 25 02:00:06 2003 From: jh at web.de (Juergen Hermann) Date: Fri Jan 24 19:59:48 2003 Subject: [Spambayes] Alpha 2 Release? In-Reply-To: Message-ID: On Sat, 25 Jan 2003 00:02:30 +0000, Richie Hindle wrote: >Any other suggestions for what might need doing? On the point of documentation, do we have any (beyond the README)? ;) Do we need more? I could certainly help with publishing technology (esp. wikis and DocBook), less with content itself (too much time needed). Ciao, J?rgen From skip at pobox.com Fri Jan 24 19:27:53 2003 From: skip at pobox.com (Skip Montanaro) Date: Fri Jan 24 20:27:58 2003 Subject: [Spambayes] Alpha 2 Release? In-Reply-To: References: Message-ID: <15921.59417.459454.66175@montanaro.dyndns.org> Richie> Skip, do you think the two of us can merge proxytrainer.py into Richie> pop3proxy.py before we make this release - hopefully the Richie> problems you were having with pop3proxy.py are cleared up now? Yes, I believe we should be able to merge them. I'll give pop3proxy another whirl tonight or tomorrow and let you know what I encounter. Skip From francois.granger at free.fr Sat Jan 25 12:03:40 2003 From: francois.granger at free.fr (=?iso-8859-1?Q?Fran=E7ois?= Granger) Date: Sat Jan 25 06:03:51 2003 Subject: [Spambayes] Alpha 2 Release? In-Reply-To: References: Message-ID: At 00:02 +0000 25/01/2003, in message [Spambayes] Alpha 2 Release?, Richie Hindle wrote: > o The ability to run multiple POP3 proxies on the same port. This works great. > o Various sundry improvements and bugfixes (including MacOS X support). Anything needed on this side for inclusion in documentation ? > o I need to get to the bottom of Fran?ois' bizarre pop3proxy problems. Just sent my "_pop3proxy.log" in separate mail to you. But my current conclusion is that I speak to a bad french pop server conbined with Outlook "special features" ;-) But, there are at least two versions of Outlook involved. I would be happy to give you an account on this server for testing if you want. This is easily doable since I have a login there that I don't use much. -- Recently using MacOSX....... From francois.granger at free.fr Sat Jan 25 12:06:44 2003 From: francois.granger at free.fr (=?iso-8859-1?Q?Fran=E7ois?= Granger) Date: Sat Jan 25 06:06:49 2003 Subject: [Spambayes] Training unsure msgs only In-Reply-To: References: Message-ID: At 21:14 +0000 24/01/2003, in message Re: [Spambayes] Training unsure msgs only, Richie Hindle wrote: > > Maybe they're described in the manual I did not care to read. ;) > >They're in the one I did not care to write (yet?) But clearly written at the top of the screen: > "Click one of the Discard / Defer / Ham / Spam headers > to check all of the buttons in that section in one go." ;-) -- Recently using MacOSX....... From richie at entrian.com Sat Jan 25 12:12:25 2003 From: richie at entrian.com (Richie Hindle) Date: Sat Jan 25 07:13:02 2003 Subject: [Spambayes] Training unsure msgs only In-Reply-To: References: Message-ID: > But clearly written at the top of the screen: > > > "Click one of the Discard / Defer / Ham / Spam headers > > to check all of the buttons in that section in one go." (Note to self: return Guido's time machine keys). -- Richie Hindle richie@entrian.com From richie at entrian.com Sat Jan 25 12:51:21 2003 From: richie at entrian.com (Richie Hindle) Date: Sat Jan 25 07:51:57 2003 Subject: [Spambayes] Alpha 2 Release? In-Reply-To: References: Message-ID: Hi Juergen, > On the point of documentation, do we have any (beyond the README)? ;) Only what's on the website, which is thin on practical details. I'm hoping to write an installation and setup guide for the POP3 proxy and web interface and add that to the website - I've already done this for my Linux Journal article, so it will just be an updated version of that. Unless anyone else is already doing this? -- Richie Hindle richie@entrian.com From richie at entrian.com Sat Jan 25 12:57:50 2003 From: richie at entrian.com (Richie Hindle) Date: Sat Jan 25 07:58:27 2003 Subject: [Spambayes] Alpha 2 Release? In-Reply-To: References: Message-ID: Hi Fran?ois, > > o Various sundry improvements and bugfixes (including MacOS X support). > > Anything needed on this side for inclusion in documentation ? I don't think so, unless there's anything MacOS-X-specific that you've run across that you think people need to know? > Just sent my "_pop3proxy.log" in separate mail to you. Aha! Thanks for that - I think I've solved it. One of the extra POP3 features introduced by RFC 2449 (http://www.faqs.org/rfcs/rfc2449.html) is pipelining, where by client can send lots of requests at once without waiting for the responses. The POP3 proxy can't cope with that, but your POP3 server at pop.laposte.net is using it. I've 'fixed' the proxy so that when the client asks for the capabilities of the server, the proxy filters out the 'pipelining' capability - that should prevent the client from trying to use pipelining. You shouldn't see any significant difference in speed, except maybe when doing lots of quick operations together (eg. deleting hundreds of emails in one go) over a high-latency connection. If you still have problems, it could be that your client is explicitly set up to use pipelining regardless of what the server says - in that case, look for a configuration option called something like "Use overlapped POP3 commands" and disable it. Hope that works... -- Richie Hindle richie@entrian.com From skip at pobox.com Sat Jan 25 10:18:11 2003 From: skip at pobox.com (Skip Montanaro) Date: Sat Jan 25 11:18:17 2003 Subject: [Spambayes] uniform command line treatment of database/pickle files? Message-ID: <15922.47299.145550.835866@montanaro.dyndns.org> Other than our convenience, I don't see any reason the different tools should use different mechanisms to specify database or pickle files on the command line. Hammiefilter.py uses: -d DBFILE use database in DBFILE -D PICKLEFILE use pickle (instead of database) in PICKLEFILE while pop3proxy.py uses: -p FILE : use the named database file -d : the database is a DBM file rather than a pickle I don't know if there are other ways the same information is spelled, but I think it would be nice if a pass was made over the existing command line arguments so that all command line tools use the same flags for the same purpose. (Proxytrainer.py is dead! Long live pop3proxy.py! Thanks Richie!) Skip From skip at pobox.com Sat Jan 25 10:25:14 2003 From: skip at pobox.com (Skip Montanaro) Date: Sat Jan 25 11:25:18 2003 Subject: [Spambayes] Oh, one other thing... Message-ID: <15922.47722.760937.263327@montanaro.dyndns.org> I almost forgot... I have this little blurb in my procmailrc file: :0 fw:hamlock | proxytee.py --prob=0.2 :0 fw:hamlock | hammiefilter.py -d $HOME/hammie.db This should probably be collapsed into just :0 fw: | hammiefilter.py -u with hammiefilter.py both passing the message along to pop3proxy.py for training, and getting the score from pop3proxy.py (the -u meant to imply "don't score it yourself, 'u'pload the message to pop3proxy and use the score it returns"). Make sense? Skip From mhammond at skippinet.com.au Sat Jan 25 17:51:52 2003 From: mhammond at skippinet.com.au (Mark Hammond) Date: Sun Jan 26 01:28:16 2003 Subject: [Spambayes] Re: Outlook: new folder selector code In-Reply-To: Message-ID: <000c01c2c43e$3ea03b60$530f8490@eden> > I'm also having problems with an exchange server after updating to the > latest from CVS - using my existing configuration it failed when > trying to establish the filtering hooks. But even from a fresh > install and empty database, any attempt to work with folders generates > an exception - for me it's in NormalizeID in msgstore.py: > "AssertionError: We expect fully qualified IDs" I've checked in a fix for this. No idea if it will fix your exchange server error, but all the IDs should now be fully qualified after a fresh install. Mark. From anthony at interlink.com.au Tue Jan 28 01:24:47 2003 From: anthony at interlink.com.au (Anthony Baxter) Date: Mon Jan 27 09:31:45 2003 Subject: [Spambayes] Alpha 2 Release? In-Reply-To: Message-ID: <200301271424.h0REOmm23697@localhost.localdomain> >>> Richie Hindle wrote > > I'd like to suggest we make another alpha release next week. Lots has > happened since alpha 1 (some of which is important to the Linux Journal > articles, which come out first thing in February): Sounds good to me. > o People need to test the up-to-date version! We need to provide an "upgrading guide", of sorts. This can just be a release note. We need to find a way to make the install script remove the older 'scripts' that have already been installed (and which may be busted!) but which are no longer in the distro. > Anthony, is there anything release-wise that needs to be done, or is it > just a matter of running "setup.py sdist --formats zip,gztar" on a fresh > checkout? I tend to do the following: make a tarball and a zipfile unpack them on a totally different machine, install it, diff against what was already there beforehand, a bunch of other, similar, sanity checking. then it's SF release dance time... Anthony -- Anthony Baxter It's never too late to have a happy childhood. From tim at fourstonesExpressions.com Mon Jan 27 08:34:15 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Mon Jan 27 09:34:54 2003 Subject: [Spambayes] Alpha 2 Release? In-Reply-To: <200301271424.h0REOmm23697@localhost.localdomain> Message-ID: We need to begin examining release migration issues, particularly when the database won't migrate between releases. We should at least give instructions on how to retrain, but better than that would be automagic upgrade of the file. I'll take a look at this... - TimS 1/27/2003 8:24:47 AM, Anthony Baxter wrote: > >>>> Richie Hindle wrote >> >> I'd like to suggest we make another alpha release next week. Lots has >> happened since alpha 1 (some of which is important to the Linux Journal >> articles, which come out first thing in February): > >Sounds good to me. > >> o People need to test the up-to-date version! > >We need to provide an "upgrading guide", of sorts. This can just >be a release note. > >We need to find a way to make the install script remove the older >'scripts' that have already been installed (and which may be busted!) >but which are no longer in the distro. > >> Anthony, is there anything release-wise that needs to be done, or is it >> just a matter of running "setup.py sdist --formats zip,gztar" on a fresh >> checkout? > >I tend to do the following: > >make a tarball and a zipfile > >unpack them on a totally different machine, install it, diff against >what was already there beforehand, a bunch of other, similar, sanity >checking. > >then it's SF release dance time... > >Anthony >-- >Anthony Baxter >It's never too late to have a happy childhood. > > >_______________________________________________ >Spambayes mailing list >Spambayes@python.org >http://mail.python.org/mailman/listinfo/spambayes > > c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org From noreply at sourceforge.net Mon Jan 27 06:18:09 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Mon Jan 27 09:43:24 2003 Subject: [Spambayes] [ spambayes-Patches-673754 ] Outlook exception when starting not in the inbox Message-ID: Patches item #673754, was opened at 2003-01-24 12:59 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498105&aid=673754&group_id=61702 Category: Outlook Group: None >Status: Closed >Resolution: Fixed Priority: 5 Submitted By: Tony Meyer (anadelonbrin) >Assigned to: Mark Hammond (mhammond) Summary: Outlook exception when starting not in the inbox Initial Comment: If Outlook is started not in the inbox (any mail folder?) - in Outlook Today, for example - an exception is caused when you first switch to a mail folder. The exception doesn't seem to cause any errors, but better safe than sorry :) Here's the trace: pythoncom error: Python error invoking COM method. Traceback (most recent call last): File "D:\Python22\lib\site- packages\win32com\server\policy.py", line 275, in _Invoke_ return self._invoke_(dispid, lcid, wFlags, args) File "D:\Python22\lib\site- packages\win32com\server\policy.py", line 280, in _invoke_ return S_OK, -1, self._invokeex_(dispid, lcid, wFlags, args, None, None) File "D:\Python22\lib\site- packages\win32com\server\policy.py", line 562, in _invokeex_ return DesignatedWrapPolicy._invokeex_( self, dispid, lcid, wFlags, args, kwArgs, serviceProvider) File "D:\Python22\lib\site- packages\win32com\server\policy.py", line 510, in _invokeex_ return apply(func, args) File "D:\CVS Modules\spambayes\Outlook2000 \addin.py", line 549, in OnFolderSwitch self.but_recover_as.Visible = show_recover_as File "D:\Python22\lib\site- packages\win32com\client\__init__.py", line 368, in __getattr__ raise AttributeError, "'%s' object has no attribute '%s'" % (repr(self), attr) exceptions.AttributeError: '' object has no attribute 'but_recover_as' ---------------------------------------------------------------------- >Comment By: Mark Hammond (mhammond) Date: 2003-01-28 01:18 Message: Logged In: YES user_id=14198 Thanks! Fixed: Checking in addin.py; /cvsroot/spambayes/spambayes/Outlook2000/addin.py,v <-- addin.py new revision: 1.46; previous revision: 1.45 ---------------------------------------------------------------------- Comment By: Tony Meyer (anadelonbrin) Date: 2003-01-24 13:33 Message: Logged In: YES user_id=552329 Akk...Next time more testing before submitting a patch :) That didn't work well at all (same problem, though). I think that this will, though. There's a comment in addin.py about an Outlook bug with OnNewExplorer, but the code doesn't seem to do what the comments say. The first OnNewExplorer call is skipped (via the do_activate bool), so it's safe to have setup in onactivate and not onselection. Anyway, this works and fixes the problem on my system. I'll leave it to others to check theirs. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498105&aid=673754&group_id=61702 From noreply at sourceforge.net Mon Jan 27 06:18:49 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Mon Jan 27 09:43:42 2003 Subject: [Spambayes] [ spambayes-Patches-639312 ] fix for outlook CompareEntryIDs bug Message-ID: Patches item #639312, was opened at 2002-11-16 23:35 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498105&aid=639312&group_id=61702 Category: None Group: None >Status: Closed >Resolution: Out of Date Priority: 5 Submitted By: Piers Haken (piersh) Assigned to: Mark Hammond (mhammond) Summary: fix for outlook CompareEntryIDs bug Initial Comment: This patch reenables the CompareEntryIDs for comparing folder IDs. It passes both the MAPI Session and the Oulook Session into the dialog, one for retrieving the exchange-compatible IDs and the other for comparing them. ---------------------------------------------------------------------- >Comment By: Mark Hammond (mhammond) Date: 2003-01-28 01:18 Message: Logged In: YES user_id=14198 The code has moved on - we are back to a MAPI and CompareEntryIds implementation. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498105&aid=639312&group_id=61702 From noreply at sourceforge.net Mon Jan 27 06:20:04 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Mon Jan 27 09:44:00 2003 Subject: [Spambayes] [ spambayes-Patches-648271 ] Code to remove the New Mail icon Message-ID: Patches item #648271, was opened at 2002-12-04 19:59 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498105&aid=648271&group_id=61702 Category: Outlook Group: None >Status: Closed >Resolution: Rejected Priority: 5 Submitted By: Peter Arnold (lardladpa) Assigned to: Nobody/Anonymous (nobody) Summary: Code to remove the New Mail icon Initial Comment: It would be great if having processed the newly arrived e-mail and discovered that they were all spam the addin could remove the New Message icon from the system tray. I know there's no programitic interface to do this but I found some VB code at http://www.slipstick.com/dev/code/clearenvicon.htm I've converted the 3 pages of VB to this small bit of python import win32gui # Locate the outlook window owning the tray icon hWnd = win32gui.FindWindow("rctrl_renwnd32", "") if hWnd != 0: # Send a NIM_DELETE to remove the icon nid = (hWnd, 0) win32gui.Shell_NotifyIcon(2, nid) # Send a WUM_RESETNOTIFICATION to the owning window win32gui.SendMessage(hWnd, 1031, 0, 0) It would be super if this patch could be integrated into the outlook plugin although I'm not quite sure where in the code it would go. ---------------------------------------------------------------------- >Comment By: Mark Hammond (mhammond) Date: 2003-01-28 01:20 Message: Logged In: YES user_id=14198 Closing this. If a better proposal for the icon is put forward, we can review it again. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498105&aid=648271&group_id=61702 From db3l at fitlinxx.com Mon Jan 27 11:34:51 2003 From: db3l at fitlinxx.com (David Bolen) Date: Mon Jan 27 11:35:03 2003 Subject: [Spambayes] Re: Outlook: new folder selector code References: <000c01c2c43e$3ea03b60$530f8490@eden> Message-ID: "Mark Hammond" writes: > I've checked in a fix for this. No idea if it will fix your exchange server > error, but all the IDs should now be fully qualified after a fresh install. Yes, the latest CVS seems to be working fine now against my exchange server. -- David From richie at entrian.com Mon Jan 27 17:50:24 2003 From: richie at entrian.com (Richie Hindle) Date: Mon Jan 27 12:51:15 2003 Subject: [Spambayes] uniform command line treatment of database/pickle files? In-Reply-To: <15922.47299.145550.835866@montanaro.dyndns.org> References: <15922.47299.145550.835866@montanaro.dyndns.org> Message-ID: [Skip] > Other than our convenience, I don't see any reason the different tools > should use different mechanisms to specify database or pickle files on the > command line. Hammiefilter.py uses: > > -d DBFILE > use database in DBFILE > -D PICKLEFILE > use pickle (instead of database) in PICKLEFILE > > while pop3proxy.py uses: > > -p FILE : use the named database file > -d : the database is a DBM file rather than a pickle > > I don't know if there are other ways the same information is spelled, but I > think it would be nice if a pass was made over the existing command line > arguments so that all command line tools use the same flags for the same > purpose. I'm not attached to either version, so by all means change one of them. I'd guess that more people are using command-line switches with hammie than with pop3proxy, so it should probably be pop3proxy that changes (but I don't have the time myself). I don't know of any other tools that have similar switches, but I haven't looked. -- Richie Hindle richie@entrian.com From tim at fourstonesExpressions.com Mon Jan 27 11:54:54 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Mon Jan 27 12:55:29 2003 Subject: [Spambayes] uniform command line treatment of database/pickle files? In-Reply-To: Message-ID: 1/27/2003 11:50:24 AM, Richie Hindle wrote: > >[Skip] >> Other than our convenience, I don't see any reason the different tools >> should use different mechanisms to specify database or pickle files on the >> command line. Hammiefilter.py uses: >> >> -d DBFILE >> use database in DBFILE >> -D PICKLEFILE >> use pickle (instead of database) in PICKLEFILE >> >> while pop3proxy.py uses: >> >> -p FILE : use the named database file >> -d : the database is a DBM file rather than a pickle >> >> I don't know if there are other ways the same information is spelled, but I >> think it would be nice if a pass was made over the existing command line >> arguments so that all command line tools use the same flags for the same >> purpose. > >I'm not attached to either version, so by all means change one of them. >I'd guess that more people are using command-line switches with hammie than >with pop3proxy, so it should probably be pop3proxy that changes (but I >don't have the time myself). I don't know of any other tools that have >similar switches, but I haven't looked. I'll change pop3proxy so that -d/-D work. I don't think I can keep -p/-d alive in the process... -TimS > >-- >Richie Hindle >richie@entrian.com > > >_______________________________________________ >Spambayes mailing list >Spambayes@python.org >http://mail.python.org/mailman/listinfo/spambayes > > c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org From richie at entrian.com Mon Jan 27 17:59:20 2003 From: richie at entrian.com (Richie Hindle) Date: Mon Jan 27 13:00:10 2003 Subject: [Spambayes] Oh, one other thing... In-Reply-To: <15922.47722.760937.263327@montanaro.dyndns.org> References: <15922.47722.760937.263327@montanaro.dyndns.org> Message-ID: <9fsa3v89or2nlkiakf8a61m5bj4lq1o757@4ax.com> [Skip] > This should probably be collapsed into just > > :0 fw: > | hammiefilter.py -u > > with hammiefilter.py both passing the message along to pop3proxy.py for > training, and getting the score from pop3proxy.py (the -u meant to imply > "don't score it yourself, 'u'pload the message to pop3proxy and use the > score it returns"). > > Make sense? Neale has already implemented a similar idea over XMLRPC, in hammiesrv and hammiecli. But making pop3proxy.py the server, rather than having another process, would integrate well with the web interface and mean we had no worries about database locking. And folding proxytee into hammiefilter sounds like a good plan too - they're essentially doing similar jobs ("Do stuff with this message!"). There are advantages to using XMLRPC, eg. cross-language compatibility. But Skip's upload system uses HTTP, which is pretty cross-language too. I don't know whether it would be easy to incorporate Neale's code into pop3proxy - it would mean making Dibbler.py (the underlying HTTP layer) understand XMLRPC. Probably hard but I don't know. I like the way Skip's going, letting hammie users use the web interface for training. Alongside Mark's Outlook plugin and Neale's Mutt (etc.) scripts, it's looking like we're covering all the bases nicely - regardless of which client you use, and whether you use hammie or pop3proxy to classify your mail, you get a nice training interface either within your email client or in your browser. -- Richie Hindle richie@entrian.com From tim at fourstonesExpressions.com Mon Jan 27 12:19:45 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Mon Jan 27 13:20:21 2003 Subject: [Spambayes] To our friends down under... (off topic) Message-ID: Happy Australia Day. :) c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org From jon at bergenstreetsoftware.com Mon Jan 27 14:22:18 2003 From: jon at bergenstreetsoftware.com (Jonathan Baumgartner) Date: Mon Jan 27 14:22:28 2003 Subject: [Spambayes] seg faults? Message-ID: I just installed pop3proxy on my OS X machine. It works beautifully. It does seem to have one issue, though. Any time I use the web application to train the incoming messages, the program will die with a segmentation fault. It doesn't give me any more information that that. It looks like this: [Groundskeeper-Willie:~/spambayes-1.0a1] jon% sudo python pop3proxy.py Password: Loading database... Done. Listener on port 110 is proxying mail.bergenstreetsoftware.com:110 User interface url is http://localhost:8880 Segmentation fault So the procedure for me to train a message goes something like this: 1. Go to http://localhost:8880. Click on "Review messages." Instead of going to the review message page, I get a page that looks like this: body { font: 90% arial, swiss, helvetica; margin: 0 } table { font: 90% arial, swiss, helvetica } form { margin: 0 } .banner { background: #c0e0ff; padding=5; padding-left: 15; border-top: 1px solid black; border-bottom: 1px solid black } .header { font-size: 133% } .content { margin: 15 } .messagetable td { padding-left: 1ex; padding-right: 1ex } .sectiontable { border: 1px solid #808080; width: 95% } .sectionheading { background: fffae0; padding-left: 1ex; border-bottom: 1px solid #808080; font-weight: bold } .sectionbody { padding: 1em } .reviewheaders a { color: #000000 } .stripe_on td { background: # 2. Switch out to Terminal and restart pop3proxy, which has died. 3. Go back to the browser and reload the page. 4. Hit "Train." Get the same page of HTML tags as in (1). 5. Restart pop3proxy again. 6. Reload the page in the browser again. Get a message that the message has been trained. Couple of questions: * Do I need to run this as superuser as I've been doing? When I tried it without sudo, I got errors about permissions. * Is anyone else trying this on OS X? I suspect I misconfigured something. thanks! jon From skip at pobox.com Mon Jan 27 13:28:11 2003 From: skip at pobox.com (Skip Montanaro) Date: Mon Jan 27 14:28:21 2003 Subject: [Spambayes] seg faults? In-Reply-To: References: Message-ID: <15925.34891.379413.710608@montanaro.dyndns.org> Jon> Couple of questions: Jon> * Do I need to run this as superuser as I've been doing? When I Jon> tried it without sudo, I got errors about permissions. If you want it to listen on port 110. Pick a high-numbered port, then teach your MUA to connect to localhost on that port. Jon> * Is anyone else trying this on OS X? I suspect I misconfigured Jon> something. Yes, I am expermenting with it, though I don't actually use the POP proxying features. I still use fetchmail over ssh to grab mail from the remote host and procmail with a couple recipes which invoke hammiefilter and/or proxytee to classify/direct the message. Skip From jh at web.de Mon Jan 27 20:29:03 2003 From: jh at web.de (Juergen Hermann) Date: Mon Jan 27 14:29:43 2003 Subject: [Spambayes] Alpha 2 Release? In-Reply-To: Message-ID: On Mon, 27 Jan 2003 08:34:15 -0600, Tim Stone - Four Stones Expressions wrote: >We need to begin examining release migration issues, particularly when the >database won't migrate between releases. We should at least give instructions >on how to retrain, but better than that would be automagic upgrade of the >file. I think the best thing would be an ex-/import tool, with the additional benefit of being able to do that not just for upgrading. Ciao, J?rgen From richie at entrian.com Mon Jan 27 19:49:11 2003 From: richie at entrian.com (Richie Hindle) Date: Mon Jan 27 14:50:00 2003 Subject: [Spambayes] seg faults? In-Reply-To: References: Message-ID: Hi Jonathan, > I just installed pop3proxy on my OS X machine. It works beautifully. > It does seem to have one issue, though. Any time I use the web > application to train the incoming messages, the program will die with > a segmentation fault. I don't use OS X myself, but others have reported that increasing the stack size fixes this: [Tony Lownds] > tcsh: ulimit stacksize 2048 > > sh: ulimit -s 2048 > > Mac OS X's default is 512, I picked 2048 at random. Hope that helps, -- Richie Hindle richie@entrian.com From jon at bergenstreetsoftware.com Mon Jan 27 14:53:21 2003 From: jon at bergenstreetsoftware.com (Jonathan Baumgartner) Date: Mon Jan 27 14:53:30 2003 Subject: [Spambayes] seg faults? In-Reply-To: References: Message-ID: At 7:49 PM +0000 1/27/03, Richie Hindle wrote: >I don't use OS X myself, but others have reported that increasing the stack >size fixes this: > >[Tony Lownds] >> tcsh: ulimit stacksize 2048 >> >> sh: ulimit -s 2048 >> >> Mac OS X's default is 512, I picked 2048 at random. > >Hope that helps, Thanks Richie. ulimit doesn't appear to exist on my machine, though. At least, which and man have never heard of it. jon From python-spambayes at discworld.dyndns.org Mon Jan 27 13:59:19 2003 From: python-spambayes at discworld.dyndns.org (Charles Cazabon) Date: Mon Jan 27 14:56:44 2003 Subject: [Spambayes] seg faults? In-Reply-To: ; from jon@bergenstreetsoftware.com on Mon, Jan 27, 2003 at 02:53:21PM -0500 References: Message-ID: <20030127135919.A15195@discworld.dyndns.org> Jonathan Baumgartner wrote: > > Thanks Richie. ulimit doesn't appear to exist on my machine, though. > At least, which and man have never heard of it. It's frequently a shell builtin. Try the documentation for your shell. Charles -- ----------------------------------------------------------------------- Charles Cazabon GPL'ed software available at: http://www.qcc.ca/~charlesc/software/ ----------------------------------------------------------------------- From richie at entrian.com Mon Jan 27 20:04:50 2003 From: richie at entrian.com (Richie Hindle) Date: Mon Jan 27 15:05:51 2003 Subject: [Spambayes] uniform command line treatment of database/pickle files? In-Reply-To: References: Message-ID: <344b3v06bact2avtdnmo4a9e18cdajn0s3@4ax.com> [Tim Stone] > I'll change pop3proxy so that -d/-D work. I don't think I can keep -p/-d > alive in the process... -TimS Great! You shouldn't try to keep the old options as well - it would only muddy the waters. -- Richie Hindle richie@entrian.com From tony-bayes at lownds.com Mon Jan 27 12:08:27 2003 From: tony-bayes at lownds.com (Tony Lownds) Date: Mon Jan 27 15:08:42 2003 Subject: [Spambayes] seg faults? In-Reply-To: <20030127135919.A15195@discworld.dyndns.org> References: <20030127135919.A15195@discworld.dyndns.org> Message-ID: At 1:59 PM -0600 1/27/03, Charles Cazabon wrote: >Jonathan Baumgartner wrote: >> >> Thanks Richie. ulimit doesn't appear to exist on my machine, though. >> At least, which and man have never heard of it. > >It's frequently a shell builtin. Try the documentation for your shell. I got the command wrong :( tcsh: limit stacksize 2048 sh: ulimit -s 2048 On tcsh it's limit, not ulimit. Would it be desirable to have pop3proxy.py take care of this? Jonathan, If you get past the segfault issue and still have issues, let us know - I use pop3proxy on OS X quite successfully. -Tony From jon at bergenstreetsoftware.com Mon Jan 27 15:10:54 2003 From: jon at bergenstreetsoftware.com (Jonathan Baumgartner) Date: Mon Jan 27 15:11:01 2003 Subject: [Spambayes] seg faults? In-Reply-To: References: <20030127135919.A15195@discworld.dyndns.org> Message-ID: At 12:08 PM -0800 1/27/03, Tony Lownds wrote: >I got the command wrong :( > > tcsh: limit stacksize 2048 > > sh: ulimit -s 2048 > >On tcsh it's limit, not ulimit. > >Would it be desirable to have pop3proxy.py take care of this? > >Jonathan, > >If you get past the segfault issue and still have issues, let us >know - I use pop3proxy on OS X quite successfully. Yippee! Thanks, Tony. That did it. Working perfectly now. jon From tim at fourstonesExpressions.com Mon Jan 27 14:33:14 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Mon Jan 27 15:34:06 2003 Subject: [Spambayes] Alpha 2 Release? In-Reply-To: Message-ID: 1/27/2003 1:29:03 PM, "Juergen Hermann" wrote: >On Mon, 27 Jan 2003 08:34:15 -0600, Tim Stone - Four Stones Expressions wrote: > >>We need to begin examining release migration issues, particularly when the >>database won't migrate between releases. We should at least give instructions >>on how to retrain, but better than that would be automagic upgrade of the >>file. My thoughts exactly. - TimS > >I think the best thing would be an ex-/import tool, with the additional benefit of >being able to do that not just for upgrading. > > >Ciao, J?rgen > > > > c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org From neale at woozle.org Mon Jan 27 12:39:29 2003 From: neale at woozle.org (Neale Pickett) Date: Mon Jan 27 15:39:37 2003 Subject: [Spambayes] uniform command line treatment of database/pickle files? In-Reply-To: (Richie Hindle's message of "Mon, 27 Jan 2003 17:50:24 +0000") References: <15922.47299.145550.835866@montanaro.dyndns.org> Message-ID: Richie Hindle writes: > I'm not attached to either version, so by all means change one of > them. I'd guess that more people are using command-line switches with > hammie than with pop3proxy, so it should probably be pop3proxy that > changes (but I don't have the time myself). I don't know of any other > tools that have similar switches, but I haven't looked. Hey guys, I didn't mean to force any changes on anyone (okay well maybe just a little)--it just seemed like -p was unneccesary. I'd like to unify the location and type of the word database. Does anything still use the pickle? Neale From neale at woozle.org Mon Jan 27 12:43:06 2003 From: neale at woozle.org (Neale Pickett) Date: Mon Jan 27 15:43:10 2003 Subject: [Spambayes] Oh, one other thing... In-Reply-To: <9fsa3v89or2nlkiakf8a61m5bj4lq1o757@4ax.com> (Richie Hindle's message of "Mon, 27 Jan 2003 17:59:20 +0000") References: <15922.47722.760937.263327@montanaro.dyndns.org> <9fsa3v89or2nlkiakf8a61m5bj4lq1o757@4ax.com> Message-ID: Richie Hindle writes: > Neale has already implemented a similar idea over XMLRPC, in hammiesrv > and hammiecli. Just so everyone knows, I have absolutely no attachment to hammiecli and hammiesrv. In fact, they should probably be moved to contrib. They were just a lesson in making a good interface to the Hammie class. Neale From tim at fourstonesExpressions.com Mon Jan 27 14:54:32 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Mon Jan 27 15:55:08 2003 Subject: [Spambayes] uniform command line treatment of database/pickle files? In-Reply-To: Message-ID: 1/27/2003 2:39:29 PM, Neale Pickett wrote: >Richie Hindle writes: > >> I'm not attached to either version, so by all means change one of >> them. I'd guess that more people are using command-line switches with >> hammie than with pop3proxy, so it should probably be pop3proxy that >> changes (but I don't have the time myself). I don't know of any other >> tools that have similar switches, but I haven't looked. > >Hey guys, I didn't mean to force any changes on anyone (okay well maybe >just a little)--it just seemed like -p was unneccesary. Agreed. Done. Question is: does hammiebulk need the -p option, or can we use -d: dbmfilename and -D: picklefilename? > >I'd like to unify the location and type of the word database. Does >anything still use the pickle? Well, I still use pickle, because I trust it a bit more on windoze than the dbm that I get with 2.2. - TimS > >Neale > >_______________________________________________ >Spambayes mailing list >Spambayes@python.org >http://mail.python.org/mailman/listinfo/spambayes > > c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org From tim at fourstonesExpressions.com Mon Jan 27 14:56:02 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Mon Jan 27 15:56:38 2003 Subject: [Spambayes] Another spam algorithm... Message-ID: This appeared on kuro5hin recently... It's a spam filtering 'algorithm' based on using gzip to measure compressability of an email message... hmmm... http://www.kuro5hin.org/story/2003/1/25/224415/367 c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org From neale at woozle.org Mon Jan 27 13:14:23 2003 From: neale at woozle.org (Neale Pickett) Date: Mon Jan 27 16:14:26 2003 Subject: [Spambayes] Alpha 2 Release? In-Reply-To: (Richie Hindle's message of "Sat, 25 Jan 2003 00:02:30 +0000") References: Message-ID: Richie Hindle writes: > Neale, do you think your Mutt edits will be ready by the middle of > next week? I haven't tried them but it sounds like they're pretty > much there? Absolutely. Sorry for the delay, I just checked them in. Thanks a ton for putting a release together, Richie. Neale From skip at pobox.com Mon Jan 27 15:23:43 2003 From: skip at pobox.com (Skip Montanaro) Date: Mon Jan 27 16:23:53 2003 Subject: [Spambayes] Alpha 2 Release? In-Reply-To: References: Message-ID: <15925.41823.746577.327335@montanaro.dyndns.org> Juergen> I think the best thing would be an ex-/import tool, with the Juergen> additional benefit of being able to do that not just for Juergen> upgrading. Might I suggest a simple csv export? You could then use one of the csv modules to import. I'm working with Dave Cole, Kevin Altis and Cliff Wells separately to try and settle on a single csv module for incorporation in to the distribution. Skip From skip at pobox.com Mon Jan 27 15:49:43 2003 From: skip at pobox.com (Skip Montanaro) Date: Mon Jan 27 16:49:53 2003 Subject: [Spambayes] Alpha 2 Release? In-Reply-To: <15925.41823.746577.327335@montanaro.dyndns.org> References: <15925.41823.746577.327335@montanaro.dyndns.org> Message-ID: <15925.43383.517511.394625@montanaro.dyndns.org> Skip> I'm working with Dave Cole, Kevin Altis and Cliff Wells separately Skip> to try and settle on a single csv module for incorporation in to Skip> the distribution. "the distribution" is "the Python distribution" not "the Spambayes distribution", just so there's no confusion. Skip From mhammond at skippinet.com.au Tue Jan 28 09:00:59 2003 From: mhammond at skippinet.com.au (Mark Hammond) Date: Mon Jan 27 17:01:54 2003 Subject: [Spambayes] To our friends down under... (off topic) In-Reply-To: Message-ID: <008701c2c64f$93ec5410$530f8490@eden> Cheers (hic!) If it weren't for the fires and heat, it would be a good one! Melbourne hit 43.4 last week! Australian's-all-let-us-rejoice ly, Mark. > Happy Australia Day. :) > > c'est moi - TimS > http://www.fourstonesExpressions.com > http://wecanstopspam.org > > > > _______________________________________________ > Spambayes mailing list > Spambayes@python.org > http://mail.python.org/mailman/listinfo/spambayes From gary at inauspicious.org Mon Jan 27 22:38:22 2003 From: gary at inauspicious.org (Gary Benson) Date: Mon Jan 27 18:21:19 2003 Subject: [Spambayes] Details in headers Message-ID: <20030127223812.GA17165@inauspicious.org> Hi, Just been playing around with Spambayes -- nice work guys! I have one question: how do you get the X-Spambayes-Classification header to contain individual word scores as mentioned in INTEGRATION.txt? Mine just say spam/ham/unsure and an overall score. Cheers, Gary [ gary@inauspicious.org ][ GnuPG 85A8F78B ][ http://inauspicious.org/ ] From T.A.Meyer at massey.ac.nz Tue Jan 28 12:23:09 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Mon Jan 27 18:30:00 2003 Subject: [Spambayes] Re: Outlook: new folder selector code Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1318D392@its-xchg4.massey.ac.nz> [David] > Given Mark's comment about being tied up currently, if you were > amenable to sending me a copy of your patches, I could also give them > a shot on my system. [Mark] > I've checked in a fix for this. No idea if it will fix your > exchange server > error, but all the IDs should now be fully qualified after a > fresh install. Sorry - long weekend here and so I didn't get to any email for a few days, otherwise I would have sent you the patch. Mark's done it all now, anyway (and somewhat better, as usual, than my attempt). :) =Tony Meyer From noreply at sourceforge.net Mon Jan 27 15:37:14 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Mon Jan 27 18:40:35 2003 Subject: [Spambayes] [ spambayes-Bugs-675811 ] Dead buttons left on uninstall Message-ID: Bugs item #675811, was opened at 2003-01-28 12:37 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=675811&group_id=61702 Category: Outlook Group: None Status: Open Resolution: None Priority: 5 Submitted By: Tony Meyer (anadelonbrin) Assigned to: Nobody/Anonymous (nobody) Summary: Dead buttons left on uninstall Initial Comment: The toolbar buttons are temporary, which causes problems if they are moved. If they are permanent, then we are left with dead buttons if we uninstall the plugin (why would we do this? ;p ). ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=675811&group_id=61702 From noreply at sourceforge.net Mon Jan 27 15:37:34 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Mon Jan 27 18:40:42 2003 Subject: [Spambayes] [ spambayes-Bugs-675811 ] Dead buttons left on uninstall Message-ID: Bugs item #675811, was opened at 2003-01-28 12:37 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=675811&group_id=61702 Category: Outlook Group: None Status: Open Resolution: None Priority: 5 Submitted By: Tony Meyer (anadelonbrin) >Assigned to: Mark Hammond (mhammond) Summary: Dead buttons left on uninstall Initial Comment: The toolbar buttons are temporary, which causes problems if they are moved. If they are permanent, then we are left with dead buttons if we uninstall the plugin (why would we do this? ;p ). ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=675811&group_id=61702 From noreply at sourceforge.net Mon Jan 27 15:40:03 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Mon Jan 27 18:40:51 2003 Subject: [Spambayes] [ spambayes-Bugs-675812 ] Outlook registration/doc issues Message-ID: Bugs item #675812, was opened at 2003-01-28 12:40 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=675812&group_id=61702 Category: Outlook Group: None Status: Open Resolution: None Priority: 5 Submitted By: Tony Meyer (anadelonbrin) Assigned to: Nobody/Anonymous (nobody) Summary: Outlook registration/doc issues Initial Comment: The plugin should be listed in Outlook's COM plug-ins list. In fact, the doc says that this is so! This is not the case (here at least). This would allow nice removal (and addition??) rather than running addin.py --unregister and so on. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=675812&group_id=61702 From noreply at sourceforge.net Mon Jan 27 15:41:50 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Mon Jan 27 18:40:57 2003 Subject: [Spambayes] [ spambayes-Bugs-675812 ] Outlook registration/doc issues Message-ID: Bugs item #675812, was opened at 2003-01-28 12:40 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=675812&group_id=61702 Category: Outlook Group: None Status: Open Resolution: None Priority: 5 Submitted By: Tony Meyer (anadelonbrin) >Assigned to: Mark Hammond (mhammond) Summary: Outlook registration/doc issues Initial Comment: The plugin should be listed in Outlook's COM plug-ins list. In fact, the doc says that this is so! This is not the case (here at least). This would allow nice removal (and addition??) rather than running addin.py --unregister and so on. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=675812&group_id=61702 From jm at jmason.org Tue Jan 28 01:09:46 2003 From: jm at jmason.org (Justin Mason) Date: Mon Jan 27 20:09:01 2003 Subject: [Spambayes] To our friends down under... (off topic) In-Reply-To: Message from "Mark Hammond" <008701c2c64f$93ec5410$530f8490@eden> Message-ID: <20030128010952.34F4216F16@jmason.org> Mark Hammond said: > Cheers (hic!) > > If it weren't for the fires and heat, it would be a good one! Melbourne hit > 43.4 last week! 43.4!! Didn't bloody do that when I was living there last year ;) cheers, --j. From anthony at interlink.com.au Tue Jan 28 16:59:49 2003 From: anthony at interlink.com.au (Anthony Baxter) Date: Tue Jan 28 01:07:37 2003 Subject: [Spambayes] Alpha 2 Release? In-Reply-To: <200301271424.h0REOmm23697@localhost.localdomain> Message-ID: <200301280559.h0S5xnn31945@localhost.localdomain> Another thing to consider might be man pages for the major command line tools - I'm happy to have a shot at these... -- Anthony Baxter It's never too late to have a happy childhood. From tony-bayes at lownds.com Mon Jan 27 23:01:53 2003 From: tony-bayes at lownds.com (Tony Lownds) Date: Tue Jan 28 02:02:26 2003 Subject: [Spambayes] Doh! Message-ID: pop3proxy.py has some syntax errors :) % cvs diff Index: pop3proxy.py =================================================================== RCS file: /cvsroot/spambayes/spambayes/pop3proxy.py,v retrieving revision 1.45 diff -u -d -b -w -r1.45 pop3proxy.py --- pop3proxy.py 27 Jan 2003 18:07:11 -0000 1.45 +++ pop3proxy.py 28 Jan 2003 06:47:08 -0000 @@ -1515,13 +1515,13 @@ state.runTestServer = True elif opt == '-b': state.launchUI = True - elif opt == '-d': // dbm file + elif opt == '-d': # dbm file state.useDB = True options.pop3proxy_persistent_storage_file = arg - elif opt == '-D': // pickle file + elif opt == '-D': # pickle file state.useDB = False options.pop3proxy_persistent_storage_file = arg - elif opt == '-p': // dead option + elif opt == '-p': # dead option print >>sys.stderr, "-p option is no longer supported, use -D\n" print >>sys.stderr, __doc__ sys.exit() From tim at fourstonesExpressions.com Tue Jan 28 06:32:21 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Tue Jan 28 07:32:58 2003 Subject: [Spambayes] Doh! In-Reply-To: Message-ID: One of the main problems with writing in a half dozen languages at the same time... argh - TimS 1/28/2003 1:01:53 AM, Tony Lownds wrote: >pop3proxy.py has some syntax errors :) > >% cvs diff >Index: pop3proxy.py >=================================================================== >RCS file: /cvsroot/spambayes/spambayes/pop3proxy.py,v >retrieving revision 1.45 >diff -u -d -b -w -r1.45 pop3proxy.py >--- pop3proxy.py 27 Jan 2003 18:07:11 -0000 1.45 >+++ pop3proxy.py 28 Jan 2003 06:47:08 -0000 >@@ -1515,13 +1515,13 @@ > state.runTestServer = True > elif opt == '-b': > state.launchUI = True >- elif opt == '-d': // dbm file >+ elif opt == '-d': # dbm file > state.useDB = True > options.pop3proxy_persistent_storage_file = arg >- elif opt == '-D': // pickle file >+ elif opt == '-D': # pickle file > state.useDB = False > options.pop3proxy_persistent_storage_file = arg >- elif opt == '-p': // dead option >+ elif opt == '-p': # dead option > print >>sys.stderr, "-p option is no longer supported, use -D \n" > print >>sys.stderr, __doc__ > sys.exit() > >_______________________________________________ >Spambayes mailing list >Spambayes@python.org >http://mail.python.org/mailman/listinfo/spambayes > > c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org From mwh at python.net Tue Jan 28 12:36:44 2003 From: mwh at python.net (Michael Hudson) Date: Tue Jan 28 07:36:56 2003 Subject: [Spambayes] Re: seg faults? References: Message-ID: <2m1y2xo3hf.fsf@starship.python.net> Richie Hindle writes: > [Tony Lownds] >> tcsh: ulimit stacksize 2048 >> >> sh: ulimit -s 2048 >> >> Mac OS X's default is 512, I picked 2048 at random. I think 2048 is the largest you can set it to, too. Could be wrong, and can't check just now... Cheers, M. From skip at pobox.com Tue Jan 28 07:11:45 2003 From: skip at pobox.com (Skip Montanaro) Date: Tue Jan 28 08:11:51 2003 Subject: [Spambayes] Re: seg faults? In-Reply-To: <2m1y2xo3hf.fsf@starship.python.net> References: <2m1y2xo3hf.fsf@starship.python.net> Message-ID: <15926.33169.344631.385423@montanaro.dyndns.org> >>> Mac OS X's default is 512, I picked 2048 at random. mh> I think 2048 is the largest you can set it to, too. Could be wrong, mh> and can't check just now... Nah, I set it on mine to 8192 with no problems... % uname -a Darwin montanaro.dyndns.org 6.3 Darwin Kernel Version 6.3: Sat Dec 14 03:11:25 PST 2002; root:xnu/xnu-344.23.obj~4/RELEASE_PPC Power Macintosh powerpc % ulimit -a core file size (blocks, -c) 0 data seg size (kbytes, -d) 6144 file size (blocks, -f) unlimited max locked memory (kbytes, -l) unlimited max memory size (kbytes, -m) unlimited open files (-n) 256 pipe size (512 bytes, -p) 1 stack size (kbytes, -s) 8192 cpu time (seconds, -t) unlimited max user processes (-u) 100 virtual memory (kbytes, -v) 14336 Skip From tim at fourstonesExpressions.com Tue Jan 28 07:53:53 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Tue Jan 28 08:54:30 2003 Subject: [Spambayes] Great article by David Berlind on the spam conference Message-ID: <7AJIMHYX3YB7XW95RP4ZNLB83105X.3e368b71@myst> http://techupdate.zdnet.com/techupdate/stories/main/0,14179,2909482,00.html c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org From neale at woozle.org Tue Jan 28 12:08:04 2003 From: neale at woozle.org (Neale Pickett) Date: Tue Jan 28 15:08:11 2003 Subject: [Spambayes] Alpha 2 Release? In-Reply-To: <15925.41823.746577.327335@montanaro.dyndns.org> (Skip Montanaro's message of "Mon, 27 Jan 2003 15:23:43 -0600") References: <15925.41823.746577.327335@montanaro.dyndns.org> Message-ID: Skip Montanaro writes: > Juergen> I think the best thing would be an ex-/import tool, with the > Juergen> additional benefit of being able to do that not just for > Juergen> upgrading. > > Might I suggest a simple csv export? You could then use one of the csv > modules to import. I'm working with Dave Cole, Kevin Altis and Cliff Wells > separately to try and settle on a single csv module for incorporation in to > the distribution. Sounds like a winner to me. Import/Export would be incredibly easy, too. Just iterate through the dict. From tim at fourstonesExpressions.com Tue Jan 28 14:10:28 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Tue Jan 28 15:11:06 2003 Subject: [Spambayes] Alpha 2 Release? In-Reply-To: Message-ID: 1/28/2003 2:08:04 PM, Neale Pickett wrote: >Skip Montanaro writes: > >> Juergen> I think the best thing would be an ex-/import tool, with the >> Juergen> additional benefit of being able to do that not just for >> Juergen> upgrading. >> >> Might I suggest a simple csv export? You could then use one of the csv >> modules to import. I'm working with Dave Cole, Kevin Altis and Cliff Wells >> separately to try and settle on a single csv module for incorporation in to >> the distribution. > >Sounds like a winner to me. Import/Export would be incredibly easy, >too. Just iterate through the dict. It has the added benefit of being able to change from dbm to pickle to ... implementation without retraining... - TimS > > c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org From neale at woozle.org Tue Jan 28 12:13:00 2003 From: neale at woozle.org (Neale Pickett) Date: Tue Jan 28 15:13:04 2003 Subject: [Spambayes] Details in headers In-Reply-To: <20030127223812.GA17165@inauspicious.org> (Gary Benson's message of "Mon, 27 Jan 2003 22:38:22 +0000") References: <20030127223812.GA17165@inauspicious.org> Message-ID: Gary Benson writes: > Hi, Hi, Gary :) > how do you get the X-Spambayes-Classification header to contain > individual word scores as mentioned in INTEGRATION.txt? $ cat <~/.spambayesrc [Hammie] hammie_debug_header: True EOD For those in the windows crowd, please tell me a sensible place to look for a .ini file, and then I'll tell you how to do this on your platform :) Neale From tim at fourstonesExpressions.com Tue Jan 28 14:17:14 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Tue Jan 28 15:17:56 2003 Subject: [Spambayes] Details in headers In-Reply-To: Message-ID: 1/28/2003 2:13:00 PM, Neale Pickett wrote: >Gary Benson writes: > >> Hi, > >Hi, Gary :) > >> how do you get the X-Spambayes-Classification header to contain >> individual word scores as mentioned in INTEGRATION.txt? > >$ cat <~/.spambayesrc >[Hammie] >hammie_debug_header: True >EOD > > >For those in the windows crowd, please tell me a sensible place to look >for a .ini file, and then I'll tell you how to do this on your platform >:) The right way to do it in any case is to use the Option Configurator. I think it's now invoked from the pop3proxy? Richie knows this stuff now. I haven't gotten it all going since the reorg yet. - TimS > >Neale > >_______________________________________________ >Spambayes mailing list >Spambayes@python.org >http://mail.python.org/mailman/listinfo/spambayes > > c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org From T.A.Meyer at massey.ac.nz Wed Jan 29 09:22:26 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Tue Jan 28 15:23:05 2003 Subject: [Spambayes] Details in headers Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1318CD32@its-xchg4.massey.ac.nz> > For those in the windows crowd, please tell me a sensible > place to look > for a .ini file This was asked before I think, so I guess it needs an answer :) Based on a search through my system, the overwhelming vote from developers seems to be in the directory of the application itself (so spambayes/). The leading second choise is in "[main drive]:\Documents and Settings\[username]\Application Data\[Application Name]\". I suspect the second one, although less common, is more correct. Perhaps that's just they way I'm leaning ;). This is from a Win2k system; IIRC NT and XP have the same sort of structure, but Win9* would not. =Tony Meyer From piersh at friskit.com Tue Jan 28 13:18:41 2003 From: piersh at friskit.com (Piers Haken) Date: Tue Jan 28 16:01:54 2003 Subject: [Spambayes] Details in headers Message-ID: <9891913C5BFE87429D71E37F08210CB9297554@zeus.sfhq.friskit.com> The recommended way to get this path is to call: SHGetSpecialFolderPath (,,CSIDL_APPDATA,FALSE); Remember, the user may not have write access to the spambayes installation directory. Piers. > -----Original Message----- > From: Meyer, Tony [mailto:T.A.Meyer@massey.ac.nz] > Sent: Tuesday, January 28, 2003 12:22 PM > To: spambayes@python.org > Subject: RE: [Spambayes] Details in headers > > > > For those in the windows crowd, please tell me a sensible > > place to look > > for a .ini file > > This was asked before I think, so I guess it needs an answer :) > > Based on a search through my system, the overwhelming vote > from developers seems to be in the directory of the > application itself (so spambayes/). The leading second > choise is in "[main drive]:\Documents and > Settings\[username]\Application Data\[Application Name]\". > > I suspect the second one, although less common, is more > correct. Perhaps that's just they way I'm leaning ;). > > This is from a Win2k system; IIRC NT and XP have the same > sort of structure, but Win9* would not. > > =Tony Meyer > > _______________________________________________ > Spambayes mailing list > Spambayes@python.org http://mail.python.org/mailman/listinfo/spambayes > From mhammond at skippinet.com.au Wed Jan 29 08:20:30 2003 From: mhammond at skippinet.com.au (Mark Hammond) Date: Tue Jan 28 16:21:34 2003 Subject: [Spambayes] Details in headers In-Reply-To: <9891913C5BFE87429D71E37F08210CB9297554@zeus.sfhq.friskit.com> Message-ID: <012201c2c713$167bbfb0$530f8490@eden> > The recommended way to get this path is to call: > > SHGetSpecialFolderPath (,,CSIDL_APPDATA,FALSE); > > Remember, the user may not have write access to the spambayes > installation directory. MS are starting to come up with good reasons for doing this too. Apart from the "Portable profiles" (or whatever they called it where your full profile followed you whereever you went on the corporate LAN), Windows XP now has a "wizard" that lets you migrate all of your documents and settings to another computer. A friend of mine tried this, and it worked well - except it did require that apps stored their data in these correct places. The Outlook addin is almost certainly going to use this API. FYI, it is used from Python thusly: >>> from win32com.shell import shell, shellcon >>> shell.SHGetSpecialFolderPath(0, shellcon.CSIDL_APPDATA) u'E:\\Documents and Settings\\skip\\Application Data' >>> Mark. -------------- next part -------------- A non-text attachment was scrubbed... Name: winmail.dat Type: application/ms-tnef Size: 2866 bytes Desc: not available Url : http://mail.python.org/pipermail/spambayes/attachments/20030129/bf7402fb/winmail.bin From T.A.Meyer at massey.ac.nz Wed Jan 29 10:23:08 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Tue Jan 28 16:23:46 2003 Subject: [Spambayes] Details in headers Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1318D3A7@its-xchg4.massey.ac.nz> [Piers] > The recommended way to get this path is to call: > SHGetSpecialFolderPath (,,CSIDL_APPDATA,FALSE); > Remember, the user may not have write access to the spambayes installation directory. Hmm...yes I should have said that. Although: from win32com.shell import shell, shellcon path = shell.SHGetSpecialFolderPath(0, shellcon.CSIDL_APPDATA) is somewhat less c and somewhat more python. (I think this defaults to the third (create directory) parameter being false). =Tony Meyer From richie at entrian.com Tue Jan 28 21:34:58 2003 From: richie at entrian.com (Richie Hindle) Date: Tue Jan 28 16:35:48 2003 Subject: [Spambayes] seg faults? In-Reply-To: References: <20030127135919.A15195@discworld.dyndns.org> Message-ID: [Tony] > sh: ulimit -s 2048 > Would it be desirable to have pop3proxy.py take care of this? Is that possible? Can a process increase its own stack size? Or would we need a shellscript wrapper? Any Mac OS X users fancy taking on the job? Questions, questions... 8-) -- Richie Hindle richie@entrian.com From tony-bayes at lownds.com Tue Jan 28 13:58:42 2003 From: tony-bayes at lownds.com (Tony Lownds) Date: Tue Jan 28 16:58:54 2003 Subject: [Spambayes] seg faults? In-Reply-To: References: <20030127135919.A15195@discworld.dyndns.org> Message-ID: At 9:34 PM +0000 1/28/03, Richie Hindle wrote: >[Tony] >> sh: ulimit -s 2048 >> Would it be desirable to have pop3proxy.py take care of this? > >Is that possible? Can a process increase its own stack size? Yep! STACK_NEED = 4<<20 import resource soft, hard = resource.getrlimit (resource.RLIMIT_STACK) if soft < STACK_NEED: resource.setrlimit (resource.RLIMIT_STACK, (STACK_NEED, hard)) > Or would we >need a shellscript wrapper? Any Mac OS X users fancy taking on the job? Sure - its a matter of machinery really. >Questions, questions... 8-) > Where would I put this? My suggestion is spambayes/platform.py That file would contain code like: if windows: from platform_win import * elif sys.platform == 'darwin': from platform_darwin import * else: # set any defaults pass Then, other parts of spambayes could get attributes from spambayes.platform, like, say, where to store database files by default. A little machinery for platform-specific stuff seems way better to me than sprinkling "if sys.platform...' checks all over the place. -Tony From skip at pobox.com Tue Jan 28 16:08:09 2003 From: skip at pobox.com (Skip Montanaro) Date: Tue Jan 28 17:08:27 2003 Subject: [Spambayes] seg faults? In-Reply-To: References: <20030127135919.A15195@discworld.dyndns.org> Message-ID: <15926.65353.991418.385713@montanaro.dyndns.org> Richie> [Tony] >> sh: ulimit -s 2048 >> Would it be desirable to have pop3proxy.py take care of this? Richie> Is that possible? Can a process increase its own stack size? Richie> Or would we need a shellscript wrapper? Any Mac OS X users Richie> fancy taking on the job? This topic came up on python-dev. The conclusion there was that the regression test script should take care of this for its own needs, but not to do this in general. Here's the relevant code from Lib/test/regrtest.py: # MacOSX (a.k.a. Darwin) has a default stack size that is too small # for deeply recursive regular expressions. We see this as crashes in # the Python test suite when running test_re.py and test_sre.py. The # fix is to set the stack limit to 2048. # This approach may also be useful for other Unixy platforms that # suffer from small default stack limits. if sys.platform == 'darwin': try: import resource except ImportError: pass else: soft, hard = resource.getrlimit(resource.RLIMIT_STACK) newsoft = min(hard, max(soft, 1024*2048)) resource.setrlimit(resource.RLIMIT_STACK, (newsoft, hard)) Skip From noreply at sourceforge.net Tue Jan 28 14:19:30 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Tue Jan 28 17:44:32 2003 Subject: [Spambayes] [ spambayes-Feature Requests-676401 ] Outlook: Storage in default user directory Message-ID: Feature Requests item #676401, was opened at 2003-01-29 11:19 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498106&aid=676401&group_id=61702 Category: None Group: None Status: Open Priority: 5 Submitted By: Tony Meyer (anadelonbrin) Assigned to: Nobody/Anonymous (nobody) Summary: Outlook: Storage in default user directory Initial Comment: Follows from comments in spambayes list from Piers Haken and Mark Hammond. It would be nice if the plugin stored the pck and ini files in a more appropriate folder than the outlook root folder - as Piers commented, the user might not have write access there. The folder SHGetSpecialFolderPath(0, shellcon.CSIDL_APPDATA) would probably be the best place. The pck's are created by the plugin and so are easy; how the default .ini file gets there is another issue. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498106&aid=676401&group_id=61702 From gary at inauspicious.org Tue Jan 28 23:58:00 2003 From: gary at inauspicious.org (Gary Benson) Date: Tue Jan 28 18:58:07 2003 Subject: [Spambayes] Details in headers In-Reply-To: References: <20030127223812.GA17165@inauspicious.org> Message-ID: <20030128235800.GK19499@inauspicious.org> Neale Pickett wrote: > Gary Benson writes: > > > Hi, > > Hi, Gary :) Well hello Neale :) > > how do you get the X-Spambayes-Classification header to contain > > individual word scores as mentioned in INTEGRATION.txt? > > $ cat <~/.spambayesrc > [Hammie] > hammie_debug_header: True > EOD Thank you. I was using hammiefilter and was trying: | [hammiefilter] | hammie_debug_header: True (but that doesn't work) Cheers, Gary [ gary@inauspicious.org ][ GnuPG 85A8F78B ][ http://inauspicious.org/ ] From neale at woozle.org Tue Jan 28 16:58:13 2003 From: neale at woozle.org (Neale Pickett) Date: Tue Jan 28 19:58:40 2003 Subject: [Spambayes] Details in headers In-Reply-To: <20030128235800.GK19499@inauspicious.org> (Gary Benson's message of "Tue, 28 Jan 2003 23:58:00 +0000") References: <20030127223812.GA17165@inauspicious.org> <20030128235800.GK19499@inauspicious.org> Message-ID: Gary Benson writes: > Thank you. I was using hammiefilter and was trying: > > | [hammiefilter] > | hammie_debug_header: True Yeah. One thing we should put on the roadmap is getting rid of the hammie_ prefix on config items under the [hammie] .ini section. Right now we're ignoring the section names and creating namespaces based on property name. I think that's confusing, to say nothing of the needless extra typing. Neale From anthony at interlink.com.au Wed Jan 29 14:18:55 2003 From: anthony at interlink.com.au (Anthony Baxter) Date: Tue Jan 28 22:21:04 2003 Subject: [Spambayes] vague standardisation of whitespace in code? Message-ID: <200301290318.h0T3IwD28422@localhost.localdomain> I'm just about to do a mungo-commit to clean up the whitespace issues over the whole codebase (using reindent.py) to try and keep things neat :) Is it worth putting something in the CVS commit scripts to either fix whitespace, or else to whine if it's not in the usual 4 space indents? From T.A.Meyer at massey.ac.nz Wed Jan 29 16:23:57 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Tue Jan 28 22:24:33 2003 Subject: [Spambayes] Outlook Plugin: Resetting messages as unread Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1318D3CF@its-xchg4.massey.ac.nz> I've noticed recently that when a message is scored it gets reset back to 'unread' (which it normally would be, but when my machine is working hard I can read a message before it manages to get scored). Should/can this be fixed? =Tony Meyer From tim_one at email.msn.com Tue Jan 28 22:28:51 2003 From: tim_one at email.msn.com (Tim Peters) Date: Tue Jan 28 22:29:06 2003 Subject: [Spambayes] vague standardisation of whitespace in code? In-Reply-To: <200301290318.h0T3IwD28422@localhost.localdomain> Message-ID: [Anthony Baxter] > I'm just about to do a mungo-commit to clean up the whitespace > issues over the whole codebase (using reindent.py) to try and > keep things neat :) reindent -v -r . from the root should do it all. > Is it worth putting something in the CVS commit scripts to > either fix whitespace, or else to whine if it's not in the > usual 4 space indents? Nope -- you can't get people to care (enough), different editors leave different kinds of slop behind, and running reindent every now & again is painless. The std checkin comment for this is "Whitespace normalization" -- every Python and Zope developer instinctively ignores such checkins, so it's also a good comment to make on a controversial change you don't want anyone to notice . From Paul.Moore at atosorigin.com Wed Jan 29 09:14:44 2003 From: Paul.Moore at atosorigin.com (Moore, Paul) Date: Wed Jan 29 04:16:51 2003 Subject: [Spambayes] Details in headers Message-ID: <16E1010E4581B049ABC51D4975CEDB886199BC@UKDCX001.uk.int.atosorigin.com> From: Meyer, Tony [mailto:T.A.Meyer@massey.ac.nz] > Based on a search through my system, the overwhelming vote from > developers seems to be in the directory of the application itself (so > spambayes/). The leading second choise is in "[main drive]:\Documents > and Settings\[username]\Application Data\[Application Name]\". One big problem with the second option is that the "Application Data" directory in the middle of that is hidden, and on Windows this makes it *very* hard to get at. Explorer doesn't display it unless you change a global setting, command line completion and the like ignores it. You basically have to type it in exactly as written, with no help at all from the system. And don't forget the quotes you need because of the space! I agree the second option is by far the most correct. But putting it there makes it 99.9% inaccessible for all but the most determined of users. So you'd better not be expecting the user to edit it manually. And your install/uninstall process had better create and delete the directory with no user intervention (for all users, not just the one doing the (un)install!!!) As usual, MS had a sensible idea, and then broke it totally in the name of "user friendliness" :-( Paul. From piersh at friskit.com Wed Jan 29 02:59:30 2003 From: piersh at friskit.com (Piers Haken) Date: Wed Jan 29 05:42:36 2003 Subject: [Spambayes] Details in headers Message-ID: <9891913C5BFE87429D71E37F08210CB9297557@zeus.sfhq.friskit.com> I think that's a bit harsh. The directory is called "Application Data", not "My Documents": it's designed to be used by well-behaved applications only and it's generally a bad idea for users to go mucking about with stuff in there (we're not talking Python developers here, folks). Also, it's generally considered bad form to delete the users' data when the application is uninstalled. The idea is that the user can pick up where they left off if they reinstall the program. Would you like the Office uninstall program to go through your hard drive deleting all your word documents? I think not... Piers. > -----Original Message----- > From: Moore, Paul [mailto:Paul.Moore@atosorigin.com] > Sent: Wednesday, January 29, 2003 1:15 AM > To: Meyer, Tony; spambayes@python.org > Subject: RE: [Spambayes] Details in headers > > > From: Meyer, Tony [mailto:T.A.Meyer@massey.ac.nz] > > Based on a search through my system, the overwhelming vote from > > developers seems to be in the directory of the application > itself (so > > spambayes/). The leading second choise is in "[main > drive]:\Documents > > and Settings\[username]\Application Data\[Application Name]\". > > One big problem with the second option is that the > "Application Data" directory in the middle of that is hidden, > and on Windows this makes it > *very* hard to get at. Explorer doesn't display it unless you > change a global setting, command line completion and the like > ignores it. You basically have to type it in exactly as > written, with no help at all from the system. And don't > forget the quotes you need because of the space! > > I agree the second option is by far the most correct. But > putting it there makes it 99.9% inaccessible for all but the > most determined of users. So you'd better not be expecting > the user to edit it manually. And your install/uninstall > process had better create and delete the directory with no > user intervention (for all users, not just the one doing the > (un)install!!!) > > As usual, MS had a sensible idea, and then broke it totally > in the name of "user friendliness" :-( > > Paul. > > _______________________________________________ > Spambayes mailing list > Spambayes@python.org http://mail.python.org/mailman/listinfo/spambayes > From Paul.Moore at atosorigin.com Wed Jan 29 12:35:10 2003 From: Paul.Moore at atosorigin.com (Moore, Paul) Date: Wed Jan 29 07:37:06 2003 Subject: [Spambayes] Details in headers Message-ID: <16E1010E4581B049ABC51D4975CEDB886199BD@UKDCX001.uk.int.atosorigin.com> From: Piers Haken [mailto:piersh@friskit.com] > I think that's a bit harsh. The directory is called > "Application Data", not "My Documents": it's designed > to be used by well-behaved applications only and it's > generally a bad idea for users to go mucking about with > stuff in there Sorry - I thought we were talking about the location of the INI file, which (at the moment, at least) is intended to be user editable. I've no problem with this location for purely application maintained configuration data. But I still think that there should at least be an option for the application directory to get deleted on uninstall - otherwise you get the same problem as with the registry of configuration data for uninstalled applications just getting left around and forgotten. Paul. From neale at woozle.org Wed Jan 29 08:07:54 2003 From: neale at woozle.org (Neale Pickett) Date: Wed Jan 29 11:08:42 2003 Subject: [Spambayes] Details in headers In-Reply-To: <16E1010E4581B049ABC51D4975CEDB886199BD@UKDCX001.uk.int.atosorigin.com> ("Moore, Paul"'s message of "Wed, 29 Jan 2003 12:35:10 -0000") References: <16E1010E4581B049ABC51D4975CEDB886199BD@UKDCX001.uk.int.atosorigin.com> Message-ID: "Moore, Paul" writes: > Sorry - I thought we were talking about the location of the INI file, > which (at the moment, at least) is intended to be user editable. Yes. I was, at least. > But I still think that there should at least be an option for the > application directory to get deleted on uninstall - otherwise you get > the same problem as with the registry of configuration data for > uninstalled applications just getting left around and forgotten. So I think I'm going to do what I did the last time I brought this up, which was nothing. I'm not in a good position to tell windows users where their files should live, so I'm going to punt. Hopefully someone who is in a good position to make a decision about this will check something in, and then we'll all just use that. We could always put it in C:\spam.ini ;) Neale From neale at woozle.org Wed Jan 29 08:21:04 2003 From: neale at woozle.org (Neale Pickett) Date: Wed Jan 29 11:21:09 2003 Subject: [Spambayes] Outlook plugin notes Message-ID: Hokay. I gave a talk on SpamBayes at work the week before the spam conference, and now all these people are hopping up and down wanting to run it. One of our more tenacious tech writers installed the bugger and hit me with a list of suggestions, which I said I'd pass on to you fine folks. So here you are. Please excuse me if any of these are already solved--she pulled down the released copy on our web page AIUI. * In her words, "When you filter to an online folder, SpamBayes automatically disables filtering when you connect offline. What I would like is that when I reconnect, SpamBayes should automatically reenable filtering and run it against those folders. Now I have to do this manually." * She says that the plugin is definitely not filtering public folders. * Apparently Outlook comes with a "Junk Email" folder. Instead of telling folks to create a "Spam Certain" folder, just have the plugin default to sending spam into the Junk folder where folks are used to filtering their spam already. * She feels end-users need more education about what "spam-possible" means. * The sliders in the configuration window should have tick marks. * In the anti-spam dialog box: o Enable filtering checkbox should be below filters, since you have to enable filtering before you can mess with the filters. o The filters box needs a scrollbar, for those with a ton of folders to filter so you can see the text. * Add a "spam column" in the anti-spam pulldown, so it's easy to add a new "spam %" column in the current folder view. * She suggested deleting from public folders should go into a public spam folder. And finally the one I find most intriguing: * All outbound mail should be trained as ham I really like this last one. I don't know if anyone's ever thought of training on outbound mail before. Anyhow, that's all. For the time being, feel free to use me as a go-between. I may demand she join this list if I have to relay too much, though :) Oh, and by the way, she really digs the Outlook plugin. Like, her diggatude is off the charts, that's how much she digs it. Neale From tim at fourstonesExpressions.com Wed Jan 29 10:36:43 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Wed Jan 29 11:37:25 2003 Subject: [Spambayes] Outlook plugin notes In-Reply-To: Message-ID: <2UO09B0NIAMKTFAICLK4XH4YGFMJ.3e38031b@myst> 1/29/2003 10:21:04 AM, Neale Pickett wrote: >Hokay. I gave a talk on SpamBayes at work the week before the spam >conference, and now all these people are hopping up and down wanting to >run it. One of our more tenacious tech writers installed the bugger and >hit me with a list of suggestions, which I said I'd pass on to you fine >folks. So here you are. Please excuse me if any of these are already >solved--she pulled down the released copy on our web page AIUI. > >* In her words, "When you filter to an online folder, SpamBayes > automatically disables filtering when you connect offline. What I > would like is that when I reconnect, SpamBayes should automatically > reenable filtering and run it against those folders. Now I have to do > this manually." > >* She says that the plugin is definitely not filtering public folders. > >* Apparently Outlook comes with a "Junk Email" folder. Instead of > telling folks to create a "Spam Certain" folder, just have the plugin > default to sending spam into the Junk folder where folks are used to > filtering their spam already. > >* She feels end-users need more education about what "spam-possible" > means. > >* The sliders in the configuration window should have tick marks. > >* In the anti-spam dialog box: > > o Enable filtering checkbox should be below filters, since you have > to enable filtering before you can mess with the filters. Leave it to a tech writer.... > > o The filters box needs a scrollbar, for those with a ton of folders > to filter so you can see the text. > >* Add a "spam column" in the anti-spam pulldown, so it's easy to add a > new "spam %" column in the current folder view. > >* She suggested deleting from public folders should go into a public > spam folder. > >And finally the one I find most intriguing: > >* All outbound mail should be trained as ham > >I really like this last one. I don't know if anyone's ever thought of >training on outbound mail before. I made this suggestion a long time ago, and the powers-that-be decided it was decidedly useless. Don't quite remember why... - TimS > >Anyhow, that's all. For the time being, feel free to use me as a >go-between. I may demand she join this list if I have to relay too >much, though :) > >Oh, and by the way, she really digs the Outlook plugin. Like, her >diggatude is off the charts, that's how much she digs it. > >Neale > > >_______________________________________________ >Spambayes mailing list >Spambayes@python.org >http://mail.python.org/mailman/listinfo/spambayes > > c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org From skip at pobox.com Wed Jan 29 11:04:46 2003 From: skip at pobox.com (Skip Montanaro) Date: Wed Jan 29 12:10:16 2003 Subject: [Spambayes] Outlook plugin notes In-Reply-To: References: Message-ID: <15928.2478.537965.516443@montanaro.dyndns.org> Neale> One of our more tenacious tech writers ... You know when a company has good tech writers because their documentation is head and shoulders above the competitions. I like to think of them as librarians without the Donna Reed ("It's a Wonderful Life") personality. ;-) Good tech writers also make extremely good testers because they want the documentation and the application to match exactly. Skip From richie at entrian.com Tue Jan 28 17:50:47 2003 From: richie at entrian.com (Richie Hindle) Date: Wed Jan 29 12:51:38 2003 Subject: [Spambayes] Alpha 2 Release? In-Reply-To: References: Message-ID: <88gd3v4cj7v7qmdl28hofbj2ptih6hd13s@4ax.com> [Neale] > Thanks a ton for putting a release together, Richie. No problem. I'm hoping to do this on Friday evening UK time, if that's OK with everyone else? I'll fix the Mac OS X stack size problem for pop3proxy before the release - I may not have time to do it "properly" by introducing a new platform-dependent module, but we can munge things around afterwards. It's more important to get a release out before the Linux Journal articles are published, and it only seems to be the pop3proxy that has the problem. Fran?ois, I haven't forgotten about you pop3proxy problem - if I have time for a deeper investigation I'll do one. -- Richie From tim at fourstonesExpressions.com Wed Jan 29 12:21:06 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Wed Jan 29 13:21:56 2003 Subject: [Spambayes] Outlook plugin notes In-Reply-To: <15928.2478.537965.516443@montanaro.dyndns.org> Message-ID: 1/29/2003 11:04:46 AM, Skip Montanaro wrote: > > Neale> One of our more tenacious tech writers ... > >You know when a company has good tech writers because their documentation is >head and shoulders above the competitions. I like to think of them as >librarians without the Donna Reed ("It's a Wonderful Life") personality. ;-) > >Good tech writers also make extremely good testers because they want the >documentation and the application to match exactly. Think we could get her to write our doc? - TimS > >Skip > >_______________________________________________ >Spambayes mailing list >Spambayes@python.org >http://mail.python.org/mailman/listinfo/spambayes > > c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org From francois.granger at free.fr Wed Jan 29 20:02:18 2003 From: francois.granger at free.fr (Francois Granger) Date: Wed Jan 29 14:02:29 2003 Subject: [Spambayes] Alpha 2 Release? In-Reply-To: <88gd3v4cj7v7qmdl28hofbj2ptih6hd13s@4ax.com> References: <88gd3v4cj7v7qmdl28hofbj2ptih6hd13s@4ax.com> Message-ID: At 17:50 +0000 28/01/2003, in message Re: [Spambayes] Alpha 2 Release?, Richie Hindle wrote: >[Neale] >> Thanks a ton for putting a release together, Richie. My thanks as well. >Fran?ois, I haven't forgotten about you pop3proxy problem - if I have time >for a deeper investigation I'll do one. No problem. I did not saw the classification problem since the other day. It seems that it is solved. I got a new fresh traceback tonight when I asked for review: Traceback (most recent call last): File "/Volumes/OS99/spambayes/spambayes/Dibbler.py", line 398, in found_terminator getattr(plugin, name)(**params) File "/Volumes/OS99/spambayes/pop3proxy.py", line 932, in onReview self._appendMessages(page.table, messages, label) File "/Volumes/OS99/spambayes/pop3proxy.py", line 823, in _appendMessages table += row File "/Volumes/OS99/spambayes/spambayes/PyMeldLite.py", line 787, in __iadd__ nodes = self._nodeListFromSource(other) File "/Volumes/OS99/spambayes/spambayes/PyMeldLite.py", line 640, in _nodeListFromSource tree = _generateTree(""+value+"") File "/Volumes/OS99/spambayes/spambayes/PyMeldLite.py", line 574, in _generateTree g.feed(source) File "/Volumes/OS99/spambayes/spambayes/PyMeldLite.py", line 499, in feed self._parser.Parse(data) File "/Volumes/OS99/spambayes/spambayes/PyMeldLite.py", line 529, in StartElementHandler newAttributes[str(name)] = self._unmungeEntities(str(value)) UnicodeEncodeError: 'ascii' codec can't encode character '\ue9' in position 86: ordinal not in range(128) -- Recently using MacOSX....... From neale at woozle.org Wed Jan 29 12:38:21 2003 From: neale at woozle.org (Neale Pickett) Date: Wed Jan 29 15:38:31 2003 Subject: [Spambayes] Alpha 2 Release? In-Reply-To: (Francois Granger's message of "Wed, 29 Jan 2003 20:02:18 +0100") References: <88gd3v4cj7v7qmdl28hofbj2ptih6hd13s@4ax.com> Message-ID: Francois Granger writes: > UnicodeEncodeError: 'ascii' codec can't encode character '\ue9' in > position 86: ordinal not in range(128) Yeah, my wife's been getting those too. I'll look into her traceback. Yikes! I just swallowed the tine of a plastic fork! Neale From richie at entrian.com Tue Jan 28 21:41:33 2003 From: richie at entrian.com (Richie Hindle) Date: Wed Jan 29 16:46:34 2003 Subject: [Spambayes] Alpha 2 Release? In-Reply-To: References: <88gd3v4cj7v7qmdl28hofbj2ptih6hd13s@4ax.com> Message-ID: [Fran?ois] > UnicodeEncodeError: 'ascii' codec can't encode character '\ue9' in > position 86: ordinal not in range(128) This is bizarre. This is expat complaining that you can't have high-bit characters in ASCII XML, which is quite right, but I replace all those characters with charrefs on the way in: >>> def replaceHighCharacters(match): ... return "&#%d;" % ord(match.group(1)) ... >>> re.sub('([\x80-\xff])', replaceHighCharacters, u"a b \xe9 c d") u'a b é c d' So what's going on...? > Yikes! I just swallowed the tine of a plastic fork! That'll teach you to try to get out of doing the washing up. 8-) -- Richie Hindle richie@entrian.com From francois.granger at free.fr Wed Jan 29 22:40:03 2003 From: francois.granger at free.fr (Francois Granger) Date: Wed Jan 29 16:57:50 2003 Subject: [Spambayes] Alpha 2 Release? In-Reply-To: References: <88gd3v4cj7v7qmdl28hofbj2ptih6hd13s@4ax.com> Message-ID: At 12:38 -0800 29/01/2003, in message Re: [Spambayes] Alpha 2 Release?, Neale Pickett wrote: >Francois Granger writes: > >> UnicodeEncodeError: 'ascii' codec can't encode character '\ue9' in >> position 86: ordinal not in range(128) > >Yeah, my wife's been getting those too. I'll look into her traceback. Your wife speaks some foreign language ? ;-) >Yikes! I just swallowed the tine of a plastic fork! My apologies for this, it is not _that_ important ;-) Thanks for all. -- Recently using MacOSX....... From vanhorn at whidbey.com Wed Jan 29 14:11:55 2003 From: vanhorn at whidbey.com (G. Armour Van Horn) Date: Wed Jan 29 17:12:00 2003 Subject: [Spambayes] Details in headers References: <16E1010E4581B049ABC51D4975CEDB886199BD@UKDCX001.uk.int.atosorigin.com> Message-ID: <3E3851AB.18018F9E@whidbey.com> Speaking as one who provides tech support for a hundred or so Windows users, I find it perverse to put any file where Windows may change it or obscure it. Against the flow from Redmond, I want my users to put their data in folders they specifically control (normally on a file server, never in "My Documents"). I want applications to put everything possible in their respective directories, not in the registry, not in the current equivalent of /windows/system. (And I always want to see all file extensions!) I imagine that I'll end up putting some form of Spambayes on at least a couple of dozen systems, so I'll get used to whatever is done, but I'd strongly prefer that the file locations be easily understood and easily learned so others don't have to spend so much time when a user messes an installation up. Van "Moore, Paul" wrote: > From: Piers Haken [mailto:piersh@friskit.com] > > I think that's a bit harsh. The directory is called > > "Application Data", not "My Documents": it's designed > > to be used by well-behaved applications only and it's > > generally a bad idea for users to go mucking about with > > stuff in there > > Sorry - I thought we were talking about the location of > the INI file, which (at the moment, at least) is intended > to be user editable. > > I've no problem with this location for purely application > maintained configuration data. > > But I still think that there should at least be an option > for the application directory to get deleted on uninstall - > otherwise you get the same problem as with the registry of > configuration data for uninstalled applications just getting > left around and forgotten. > > Paul. > > _______________________________________________ > Spambayes mailing list > Spambayes@python.org > http://mail.python.org/mailman/listinfo/spambayes -- ---------------------------------------------------------- Sign up now for Quotes of the Day, a handful of quotations on a theme delivered every morning. Enlightenment! Daily, for free! mailto:twisted@whidbey.com?subject=Subscribe_QOTD For web hosting and maintenance, visit Van's home page: http://www.domainvanhorn.com/van/ ---------------------------------------------------------- From T.A.Meyer at massey.ac.nz Thu Jan 30 11:32:01 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Wed Jan 29 17:32:40 2003 Subject: [Spambayes] Outlook plugin notes Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1318CD37@its-xchg4.massey.ac.nz> > * In her words, "When you filter to an online folder, SpamBayes > automatically disables filtering when you connect offline. What I > would like is that when I reconnect, SpamBayes should automatically > reenable filtering and run it against those folders. Now I > have to do this manually." I don't use any offline features, so I can't comment on this. > * Apparently Outlook comes with a "Junk Email" folder. Hmm...does anyone else's Outlook have a "Junk Email" folder? Mine (2000 SR1) certainly didn't come with one. > * She feels end-users need more education about what "spam-possible" means. That would be a documentation issue, right? And wasn't she a writer? .... > * The sliders in the configuration window should have tick marks. I guess I would agree with that. I don't know who would even use the sliders when there's a text box just there, but ... > * In the anti-spam dialog box: > o Enable filtering checkbox should be below filters, since you have > to enable filtering before you can mess with the filters. Agreed. > o The filters box needs a scrollbar, for those with a ton of folders > to filter so you can see the text. Or some other way of showing them all. A scrollbar would make it ugly, wouldn't it? > * Add a "spam column" in the anti-spam pulldown, so it's easy to add a > new "spam %" column in the current folder view. Is it possible to customise the current view via code? I wondered about doing this myself, since I seem to be constantly adding the column to new folders, but couldn't find any information about doing so (I must admit I didn't look that hard). > * She says that the plugin is definitely not filtering public folders. This might be an issue that has been resolved (I'm not sure what version the release had). I'm checking this out on my system, but I don't have a lot of (mail) public folders (they're mostly calendars). I'll have to wait until one of them gets mail. > * She suggested deleting from public folders should go into a public > spam folder. Perhaps there could be an option to have mail from each folder you filter: (a) go to the same uncertain/spam folders [as now] (b) go to individual uncertain/spam folders [one set per filtered folder] This would be quite a big interface change, though. Do people think it's worth it? > * All outbound mail should be trained as ham > I really like this last one. I don't know if anyone's ever thought of > training on outbound mail before. Tim's post on this is in the November 2002 archive - "Bayes Training". The arguments against were: * Because some spam is 'from' yourself, this deteriates the helpfulness of the from header. * It's easy to find enough ham; much more deteriates the ratio. If a user saves their outgoing mail (in "Sent Items"), for example, then it's easy to train on that folder. I do this. =Tony Meyer From T.A.Meyer at massey.ac.nz Thu Jan 30 11:36:44 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Wed Jan 29 17:37:19 2003 Subject: [Spambayes] Outlook plugin notes Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1318D3D7@its-xchg4.massey.ac.nz> > > * Apparently Outlook comes with a "Junk Email" folder. > Hmm...does anyone else's Outlook have a "Junk Email" folder? > Mine (2000 SR1) certainly didn't come with one. I take this back. If you use the Adult Mail/Junk Mail rules that Outlook offers, then if you chose the 'move mail' option and the 'junk mail' folder, then you are prompted to create a new "Junk Mail" folder. However, not every user will have one of these. I would suggest that the best option would be to change the documentation to suggest that _if there is one_ then to use the Junk Mail folder. I guess the plugin could check to see if there was an existing 'junk mail' folder and default to it, but then it could check for a 'spam' folder and default to that, too. Depends on what people are likely to have. =Tony Meyer From ducky at webfoot.com Wed Jan 29 14:53:06 2003 From: ducky at webfoot.com (Kaitlin Duck Sherwood) Date: Wed Jan 29 17:49:34 2003 Subject: [Spambayes] egregious patents on anti-spam techniques Message-ID: Gang -- I've recently become aware of two egregious patent applications related to spam fighting. The first one looks like it might conceivably cover Bayesian filtering. It would be good if someone more familiar with Bayesian/classifier/machine learning theory could check it out and perhaps challenge ("protest") the application. The second is on using whitelists, blacklists, challenge-response, and digital signatures to combat spam. I plan to protest that one myself. I have killer prior art for whitelists, blacklists, and challenge-response (see p.82 of _Stopping Spam_ by Schwartz & Garfinkel, 1998). I do not know of prior art for using digital signatures in the service of stopping spam. If you know of prior art for that, you might want to issue a protest and/or send me the info. (If you send me prior art on digital signatures/spam, please + read the patent claims first + put PRIOR ART in the subject line.) I'm going to Japan for ten days, leaving Friday morning, and will not have email connectivity then. To protest a patent, you need to file prior art (within 60 days!) with the patent office. See: http://www.uspto.gov/web/offices/pac/mpep/documents/1900.htm and http://www.uspto.gov/web/offices/pac/mpep/documents/0600_610.htm#sect610 Patent application on adaptive spam filtering: Patent application on whitelists, blacklists, challenge-response, and digital signatures used in spam-fighting: From francois.granger at free.fr Thu Jan 30 08:21:40 2003 From: francois.granger at free.fr (Francois Granger) Date: Thu Jan 30 02:21:44 2003 Subject: [Spambayes] Outlook plugin notes In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F1318D3D7@its-xchg4.massey.ac.nz> References: <1ED4ECF91CDED24C8D012BCF2B034F1318D3D7@its-xchg4.massey.ac.nz> Message-ID: At 11:36 +1300 30/01/2003, in message RE: [Spambayes] Outlook plugin notes, Meyer, Tony wrote: >I guess the plugin could check to see if there was an existing 'junk >mail' folder and default to it In this case, beware of localization issues. It may be translated in localized versions. -- Recently using MacOSX....... From francois.granger at free.fr Thu Jan 30 09:42:33 2003 From: francois.granger at free.fr (Francois Granger) Date: Thu Jan 30 03:42:38 2003 Subject: [Spambayes] Alpha 2 Release? In-Reply-To: References: <88gd3v4cj7v7qmdl28hofbj2ptih6hd13s@4ax.com> Message-ID: At 22:40 +0100 29/01/2003, in message Re: [Spambayes] Alpha 2 Release?, Francois Granger wrote: >At 12:38 -0800 29/01/2003, in message Re: [Spambayes] Alpha 2 >Release?, Neale Pickett wrote: >>Francois Granger writes: >> >>> UnicodeEncodeError: 'ascii' codec can't encode character '\ue9' in >>> position 86: ordinal not in range(128) Some more info: This error showed up since install of 2.3a1. It does not happens with my normal setup with Python 2.2 I removed all mail coded with accented chars and kept only english mails with no accented chars, no error. I can pack some mails for you if anybody want. Side remark: On MacOS X, upgrading from 2.2 to 2.3 changes the default database format. The first time I started pop3proxy with 2.3a1, it created a new database even with the old one available.after playing with it a little, i changed the line: dbm_type = best to dbm_type = dbhash and it got my old database. Can we add in the doc that the values for this option are: "best", "db3hash", "dbhash", "gdbm", "dumbdbm" -- Recently using MacOSX....... From Paul.Moore at atosorigin.com Thu Jan 30 09:43:50 2003 From: Paul.Moore at atosorigin.com (Moore, Paul) Date: Thu Jan 30 04:45:12 2003 Subject: [Spambayes] Outlook plugin notes Message-ID: <16E1010E4581B049ABC51D4975CEDB880113D89E@UKDCX001.uk.int.atosorigin.com> From: Meyer, Tony [mailto:T.A.Meyer@massey.ac.nz] > > * Apparently Outlook comes with a "Junk Email" folder. > Hmm...does anyone else's Outlook have a "Junk Email" > folder? Mine (2000 SR1) certainly didn't come with one. If you use Outlook's built in junk filtering (which, IMHO, is pretty useless...) it creates a "Junk Email" folder when you set it up. But it's not there by default. I think it's just a normal folder, though, so you could check for a folder called "Junk Email" and use it if it exists, otherwise work as at present. I'm not sure if that's worth the effort, though. Maybe just change the docs to refer to a "Junk Email" folder rather than a "Spam" folder. Paul. From mhammond at skippinet.com.au Thu Jan 30 22:04:38 2003 From: mhammond at skippinet.com.au (Mark Hammond) Date: Thu Jan 30 06:05:33 2003 Subject: [Spambayes] Outlook plugin notes In-Reply-To: Message-ID: <005301c2c84f$61ff0630$530f8490@eden> Just so you know I am not ignoring this thread, I tend to agree with many of the points. My intention is to reply in detail as I fix them! Also-helping-a-friend-lay-a-concrete-slab-and-am-buggered ly, Mark. From mwh at python.net Thu Jan 30 11:15:00 2003 From: mwh at python.net (Michael Hudson) Date: Thu Jan 30 06:15:07 2003 Subject: [Spambayes] Re: Alpha 2 Release? References: <88gd3v4cj7v7qmdl28hofbj2ptih6hd13s@4ax.com> Message-ID: <2m3cnanb2j.fsf@starship.python.net> Richie Hindle writes: > [Fran?ois] >> UnicodeEncodeError: 'ascii' codec can't encode character '\ue9' in >> position 86: ordinal not in range(128) > > This is bizarre. This is expat complaining that you can't have high-bit > characters in ASCII XML, which is quite right, but I replace all those > characters with charrefs on the way in: > >>>> def replaceHighCharacters(match): > ... return "&#%d;" % ord(match.group(1)) > ... >>>> re.sub('([\x80-\xff])', replaceHighCharacters, u"a b \xe9 c d") > u'a b é c d' > > So what's going on...? Umm, that regexp isn't going to match, e.g. u"\N{EURO SIGN}": >>> ord(u"\N{EURO SIGN}") 8364 Could that be what's happening? Cheers, M. -- > Or can I sweep that can of worms under the rug? Please shove them under the garage. -- Greg Ward and Guido van Rossum mix their metaphors on python-dev From richie at entrian.com Wed Jan 29 16:50:13 2003 From: richie at entrian.com (Richie Hindle) Date: Thu Jan 30 11:51:09 2003 Subject: [Spambayes] Alpha 2 Release? In-Reply-To: References: <88gd3v4cj7v7qmdl28hofbj2ptih6hd13s@4ax.com> Message-ID: <1c1g3v8bglqes83r9le7rhod1mdeotsu19@4ax.com> [Fran?ois] > I can pack some mails for you if anybody want. Yes please, that would be very useful. I'd love to get this one fixed before the release. > On MacOS X, upgrading from 2.2 to 2.3 changes the default database format. This is scary - any Mac OS X people know what's going on here? -- Richie Hindle richie@entrian.com From richie at entrian.com Wed Jan 29 16:50:15 2003 From: richie at entrian.com (Richie Hindle) Date: Thu Jan 30 11:51:19 2003 Subject: [Spambayes] Re: Alpha 2 Release? In-Reply-To: <2m3cnanb2j.fsf@starship.python.net> References: <88gd3v4cj7v7qmdl28hofbj2ptih6hd13s@4ax.com> <2m3cnanb2j.fsf@starship.python.net> Message-ID: [Michael] > Umm, that regexp isn't going to match, e.g. u"\N{EURO SIGN}": I don't think that's the problem - I believe the input is plain ASCII with high characters embedded. I'll know more when Fran?ois (or anyone?) forwards a troublesome example email to me. -- Richie Hindle richie@entrian.com From grobinson at transpose.com Thu Jan 30 12:11:39 2003 From: grobinson at transpose.com (Gary Robinson) Date: Thu Jan 30 12:11:43 2003 Subject: [Spambayes] Re: egregious patents on anti-spam techniques (Kaitlin Duck Sherwood) In-Reply-To: Message-ID: > Patent application on adaptive spam filtering: > ahtml/PTO/search-bool.html&r=3&f=G&l=50&co1=AND&d=PG01&s1=email.TTL.&OS=TTL/em > ail&RS=TTL/email> I looked at this last night. I am not a lawyer, so don't go to the bank on what I say. And I didn't spend a huge amount of time on it. But I do have some experience with patents, and I do understand the spambayes approach and the gist of their approach. It is my impression that the patent does not have a scope that encompasses Graham-derived filters, because they do not calculate "first" and "second" "symantic anchors" as the term is used in Claim 1. They seem to be trying to make a straightforward adaptation of technology that works well for classifying documents according to subject area, latent semantic analysis, into the spam realm. It would be very, very interesting to code and test their algorithm's performance against that of spambayes. One aspect of using latent semantic analysis is that it treats synonyms of known spammy words much as it does the spammy words themselves. It's sophisticated technology. But I'm not sure that its advantages matter much for spam detection with the kind of data we have available. It would be very interesting to know. --Gary -- [http://ThisURLEnablesEmailToGetThroughOverzealousSpamFilters.org] Gary Robinson CEO Transpose, LLC grobinson@transpose.com 207-942-3463 http://www.transpose.com http://radio.weblogs.com/0101454 > From: spambayes-request@python.org > Reply-To: spambayes@python.org > Date: Wed, 29 Jan 2003 17:50:54 -0500 > To: spambayes@python.org > Subject: Spambayes Digest, Vol 53, Issue 55 > > Send Spambayes mailing list submissions to > spambayes@python.org > > To subscribe or unsubscribe via the World Wide Web, visit > http://mail.python.org/mailman/listinfo/spambayes > or, via email, send a message with subject or body 'help' to > spambayes-request@python.org > > You can reach the person managing the list at > spambayes-owner@python.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of Spambayes digest..." > > > Today's Topics: > > 1. Re: Outlook plugin notes (Skip Montanaro) > 2. Re: Alpha 2 Release? (Richie Hindle) > 3. Re: Outlook plugin notes (Tim Stone - Four Stones Expressions) > 4. Re: Alpha 2 Release? (Francois Granger) > 5. Re: Alpha 2 Release? (Neale Pickett) > 6. Re: Alpha 2 Release? (Richie Hindle) > 7. Re: Alpha 2 Release? (Francois Granger) > 8. Re: Details in headers (G. Armour Van Horn) > 9. RE: Outlook plugin notes (Meyer, Tony) > 10. RE: Outlook plugin notes (Meyer, Tony) > 11. egregious patents on anti-spam techniques (Kaitlin Duck Sherwood) > > > ---------------------------------------------------------------------- > > Date: Wed, 29 Jan 2003 11:04:46 -0600 > From: Skip Montanaro > To: Neale Pickett > Cc: spambayes@python.org > Subject: Re: [Spambayes] Outlook plugin notes > Message-ID: <15928.2478.537965.516443@montanaro.dyndns.org> > In-Reply-To: > References: > Content-Type: text/plain; charset=us-ascii > MIME-Version: 1.0 > Content-Transfer-Encoding: 7bit > Precedence: list > Reply-To: skip@pobox.com > Message: 1 > > > Neale> One of our more tenacious tech writers ... > > You know when a company has good tech writers because their documentation is > head and shoulders above the competitions. I like to think of them as > librarians without the Donna Reed ("It's a Wonderful Life") personality. ;-) > > Good tech writers also make extremely good testers because they want the > documentation and the application to match exactly. > > Skip > > ------------------------------ > > Date: Tue, 28 Jan 2003 17:50:47 +0000 > From: Richie Hindle > To: spambayes@python.org > Subject: Re: [Spambayes] Alpha 2 Release? > Message-ID: <88gd3v4cj7v7qmdl28hofbj2ptih6hd13s@4ax.com> > In-Reply-To: > References: > > Content-Type: text/plain; charset=ISO-8859-1 > MIME-Version: 1.0 > Content-Transfer-Encoding: 8bit > Precedence: list > Reply-To: richie@entrian.com > Message: 2 > > > [Neale] >> Thanks a ton for putting a release together, Richie. > > No problem. I'm hoping to do this on Friday evening UK time, if that's OK > with everyone else? > > I'll fix the Mac OS X stack size problem for pop3proxy before the release - > I may not have time to do it "properly" by introducing a new > platform-dependent module, but we can munge things around afterwards. It's > more important to get a release out before the Linux Journal articles are > published, and it only seems to be the pop3proxy that has the problem. > > Fran?ois, I haven't forgotten about you pop3proxy problem - if I have time > for a deeper investigation I'll do one. > > -- > Richie > > ------------------------------ > > Date: Wed, 29 Jan 2003 12:21:06 -0600 > From: Tim Stone - Four Stones Expressions > To: Neale Pickett , skip@pobox.com > Cc: spambayes@python.org > Subject: Re: [Spambayes] Outlook plugin notes > Message-ID: > In-Reply-To: <15928.2478.537965.516443@montanaro.dyndns.org> > Content-Type: text/plain; charset="us-ascii" > MIME-Version: 1.0 > Precedence: list > Reply-To: tim@fourstonesExpressions.com > Message: 3 > > 1/29/2003 11:04:46 AM, Skip Montanaro wrote: > >> >> Neale> One of our more tenacious tech writers ... >> >> You know when a company has good tech writers because their documentation is >> head and shoulders above the competitions. I like to think of them as >> librarians without the Donna Reed ("It's a Wonderful Life") personality. ;-) >> >> Good tech writers also make extremely good testers because they want the >> documentation and the application to match exactly. > Think we could get her to write our doc? - TimS >> >> Skip >> >> _______________________________________________ >> Spambayes mailing list >> Spambayes@python.org >> http://mail.python.org/mailman/listinfo/spambayes >> >> > > > c'est moi - TimS > http://www.fourstonesExpressions.com > http://wecanstopspam.org > > > > ------------------------------ > > Date: Wed, 29 Jan 2003 20:02:18 +0100 > From: Francois Granger > To: richie@entrian.com > Cc: spambayes@python.org > Subject: Re: [Spambayes] Alpha 2 Release? > Message-ID: > In-Reply-To: <88gd3v4cj7v7qmdl28hofbj2ptih6hd13s@4ax.com> > References: > <88gd3v4cj7v7qmdl28hofbj2ptih6hd13s@4ax.com> > Content-Type: text/plain; charset="iso-8859-1" ; format="flowed" > MIME-Version: 1.0 > Content-Transfer-Encoding: 8bit > Precedence: list > Message: 4 > > At 17:50 +0000 28/01/2003, in message Re: [Spambayes] Alpha 2 > Release?, Richie Hindle wrote: >> [Neale] >>> Thanks a ton for putting a release together, Richie. > > My thanks as well. > >> Fran?ois, I haven't forgotten about you pop3proxy problem - if I have time >> for a deeper investigation I'll do one. > > No problem. > I did not saw the classification problem since the other day. It > seems that it is solved. > > I got a new fresh traceback tonight when I asked for review: > > Traceback (most recent call last): > > File "/Volumes/OS99/spambayes/spambayes/Dibbler.py", line 398, in > found_terminator > getattr(plugin, name)(**params) > > File "/Volumes/OS99/spambayes/pop3proxy.py", line 932, in onReview > self._appendMessages(page.table, messages, label) > > File "/Volumes/OS99/spambayes/pop3proxy.py", line 823, in _appendMessages > table += row > > File "/Volumes/OS99/spambayes/spambayes/PyMeldLite.py", line 787, in __iadd__ > nodes = self._nodeListFromSource(other) > > File "/Volumes/OS99/spambayes/spambayes/PyMeldLite.py", line 640, > in _nodeListFromSource > tree = _generateTree(""+value+"") > > File "/Volumes/OS99/spambayes/spambayes/PyMeldLite.py", line 574, > in _generateTree > g.feed(source) > > File "/Volumes/OS99/spambayes/spambayes/PyMeldLite.py", line 499, in feed > self._parser.Parse(data) > > File "/Volumes/OS99/spambayes/spambayes/PyMeldLite.py", line 529, > in StartElementHandler > newAttributes[str(name)] = self._unmungeEntities(str(value)) > > UnicodeEncodeError: 'ascii' codec can't encode character '\ue9' in > position 86: ordinal not in range(128) > > > > > -- > Recently using MacOSX....... > > ------------------------------ > > Date: Wed, 29 Jan 2003 12:38:21 -0800 > From: Neale Pickett > To: Francois Granger > Cc: spambayes@python.org > Subject: Re: [Spambayes] Alpha 2 Release? > Message-ID: > In-Reply-To: (Francois Granger's > message of "Wed, 29 Jan 2003 20:02:18 +0100") > References: > > <88gd3v4cj7v7qmdl28hofbj2ptih6hd13s@4ax.com> > > Content-Type: text/plain; charset=us-ascii > MIME-Version: 1.0 > Precedence: list > Message: 5 > > Francois Granger writes: > >> UnicodeEncodeError: 'ascii' codec can't encode character '\ue9' in >> position 86: ordinal not in range(128) > > Yeah, my wife's been getting those too. I'll look into her traceback. > > Yikes! I just swallowed the tine of a plastic fork! > > Neale > > ------------------------------ > > Date: Tue, 28 Jan 2003 21:41:33 +0000 > From: Richie Hindle > To: spambayes@python.org > Subject: Re: [Spambayes] Alpha 2 Release? > Message-ID: > In-Reply-To: > References: > <88gd3v4cj7v7qmdl28hofbj2ptih6hd13s@4ax.com> > > Content-Type: text/plain; charset=ISO-8859-1 > MIME-Version: 1.0 > Content-Transfer-Encoding: 8bit > Precedence: list > Reply-To: richie@entrian.com > Message: 6 > > > [Fran?ois] >> UnicodeEncodeError: 'ascii' codec can't encode character '\ue9' in >> position 86: ordinal not in range(128) > > This is bizarre. This is expat complaining that you can't have high-bit > characters in ASCII XML, which is quite right, but I replace all those > characters with charrefs on the way in: > >>>> def replaceHighCharacters(match): > ... return "&#%d;" % ord(match.group(1)) > ... >>>> re.sub('([\x80-\xff])', replaceHighCharacters, u"a b \xe9 c d") > u'a b é c d' > > So what's going on...? > > >> Yikes! I just swallowed the tine of a plastic fork! > > That'll teach you to try to get out of doing the washing up. 8-) > > -- > Richie Hindle > richie@entrian.com > > > ------------------------------ > > Date: Wed, 29 Jan 2003 22:40:03 +0100 > From: Francois Granger > To: spambayes@python.org > Subject: Re: [Spambayes] Alpha 2 Release? > Message-ID: > In-Reply-To: > References: > <88gd3v4cj7v7qmdl28hofbj2ptih6hd13s@4ax.com> > > Content-Type: text/plain; charset="us-ascii" ; format="flowed" > MIME-Version: 1.0 > Precedence: list > Message: 7 > > At 12:38 -0800 29/01/2003, in message Re: [Spambayes] Alpha 2 > Release?, Neale Pickett wrote: >> Francois Granger writes: >> >>> UnicodeEncodeError: 'ascii' codec can't encode character '\ue9' in >>> position 86: ordinal not in range(128) >> >> Yeah, my wife's been getting those too. I'll look into her traceback. > > Your wife speaks some foreign language ? ;-) > >> Yikes! I just swallowed the tine of a plastic fork! > > My apologies for this, it is not _that_ important ;-) > > Thanks for all. > > -- > Recently using MacOSX....... > > ------------------------------ > > Date: Wed, 29 Jan 2003 14:11:55 -0800 > From: "G. Armour Van Horn" > Cc: spambayes@python.org > Subject: Re: [Spambayes] Details in headers > Message-ID: <3E3851AB.18018F9E@whidbey.com> > References: > <16E1010E4581B049ABC51D4975CEDB886199BD@UKDCX001.uk.int.atosorigin.com> > Content-Type: text/plain; charset=us-ascii > MIME-Version: 1.0 > Content-Transfer-Encoding: 7bit > Precedence: list > Reply-To: vanhorn@whidbey.com > Message: 8 > > Speaking as one who provides tech support for a hundred or so Windows > users, I find it perverse to put any file where Windows may change it or > obscure it. Against the flow from Redmond, I want my users to put their > data in folders they specifically control (normally on a file server, > never in "My Documents"). I want applications to put everything possible > in their respective directories, not in the registry, not in the current > equivalent of /windows/system. (And I always want to see all file > extensions!) > > I imagine that I'll end up putting some form of Spambayes on at least a > couple of dozen systems, so I'll get used to whatever is done, but I'd > strongly prefer that the file locations be easily understood and easily > learned so others don't have to spend so much time when a user messes an > installation up. > > Van > > "Moore, Paul" wrote: > >> From: Piers Haken [mailto:piersh@friskit.com] >>> I think that's a bit harsh. The directory is called >>> "Application Data", not "My Documents": it's designed >>> to be used by well-behaved applications only and it's >>> generally a bad idea for users to go mucking about with >>> stuff in there >> >> Sorry - I thought we were talking about the location of >> the INI file, which (at the moment, at least) is intended >> to be user editable. >> >> I've no problem with this location for purely application >> maintained configuration data. >> >> But I still think that there should at least be an option >> for the application directory to get deleted on uninstall - >> otherwise you get the same problem as with the registry of >> configuration data for uninstalled applications just getting >> left around and forgotten. >> >> Paul. >> >> _______________________________________________ >> Spambayes mailing list >> Spambayes@python.org >> http://mail.python.org/mailman/listinfo/spambayes > > -- > ---------------------------------------------------------- > Sign up now for Quotes of the Day, a handful of quotations > on a theme delivered every morning. > Enlightenment! Daily, for free! > mailto:twisted@whidbey.com?subject=Subscribe_QOTD > > For web hosting and maintenance, > visit Van's home page: http://www.domainvanhorn.com/van/ > ---------------------------------------------------------- > > > > ------------------------------ > > Date: Thu, 30 Jan 2003 11:32:01 +1300 > From: "Meyer, Tony" > To: > Subject: RE: [Spambayes] Outlook plugin notes > Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1318CD37@its-xchg4.massey.ac.nz> > Content-Type: text/plain; > charset="iso-8859-1" > MIME-Version: 1.0 > Content-Transfer-Encoding: quoted-printable > Precedence: list > Message: 9 > >> * In her words, "When you filter to an online folder, SpamBayes >> automatically disables filtering when you connect offline. What I >> would like is that when I reconnect, SpamBayes should automatically >> reenable filtering and run it against those folders. Now I=20 >> have to do this manually." > I don't use any offline features, so I can't comment on this. > >> * Apparently Outlook comes with a "Junk Email" folder. > Hmm...does anyone else's Outlook have a "Junk Email" folder? Mine (2000 = > SR1) certainly didn't come with one. > >> * She feels end-users need more education about what "spam-possible" = > means. > That would be a documentation issue, right? And wasn't she a writer? = > .... > >> * The sliders in the configuration window should have tick marks. > I guess I would agree with that. I don't know who would even use the = > sliders when there's a text box just there, but ... > >> * In the anti-spam dialog box: >> o Enable filtering checkbox should be below filters, since you have >> to enable filtering before you can mess with the filters. > Agreed. > >> o The filters box needs a scrollbar, for those with a ton of = > folders >> to filter so you can see the text. > Or some other way of showing them all. A scrollbar would make it ugly, = > wouldn't it? > >> * Add a "spam column" in the anti-spam pulldown, so it's easy to add a >> new "spam %" column in the current folder view. > Is it possible to customise the current view via code? I wondered about = > doing this myself, since I seem to be constantly adding the column to = > new folders, but couldn't find any information about doing so (I must = > admit I didn't look that hard). > >> * She says that the plugin is definitely not filtering public folders. > This might be an issue that has been resolved (I'm not sure what version = > the release had). I'm checking this out on my system, but I don't have = > a lot of (mail) public folders (they're mostly calendars). I'll have to = > wait until one of them gets mail. > >> * She suggested deleting from public folders should go into a public >> spam folder. > Perhaps there could be an option to have mail from each folder you = > filter: > (a) go to the same uncertain/spam folders [as now] > (b) go to individual uncertain/spam folders [one set per filtered = > folder] > This would be quite a big interface change, though. Do people think = > it's worth it? > >> * All outbound mail should be trained as ham >> I really like this last one. I don't know if anyone's ever thought of >> training on outbound mail before. > Tim's post on this is in the November 2002 archive - "Bayes Training". = > The arguments against were: > * Because some spam is 'from' yourself, this deteriates the helpfulness = > of the from header. > * It's easy to find enough ham; much more deteriates the ratio. > > If a user saves their outgoing mail (in "Sent Items"), for example, then = > it's easy to train on that folder. I do this. > > =3DTony Meyer > > ------------------------------ > > Date: Thu, 30 Jan 2003 11:36:44 +1300 > From: "Meyer, Tony" > To: > Subject: RE: [Spambayes] Outlook plugin notes > Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1318D3D7@its-xchg4.massey.ac.nz> > Content-Type: text/plain; > charset="iso-8859-1" > MIME-Version: 1.0 > Content-Transfer-Encoding: quoted-printable > Precedence: list > Message: 10 > >>> * Apparently Outlook comes with a "Junk Email" folder. >> Hmm...does anyone else's Outlook have a "Junk Email" folder? =20 >> Mine (2000 SR1) certainly didn't come with one. > > I take this back. If you use the Adult Mail/Junk Mail rules that = > Outlook offers, then if you chose the 'move mail' option and the 'junk = > mail' folder, then you are prompted to create a new "Junk Mail" folder. > > However, not every user will have one of these. I would suggest that = > the best option would be to change the documentation to suggest that _if = > there is one_ then to use the Junk Mail folder. I guess the plugin = > could check to see if there was an existing 'junk mail' folder and = > default to it, but then it could check for a 'spam' folder and default = > to that, too. Depends on what people are likely to have. > > =3DTony Meyer > > ------------------------------ > > Date: Wed, 29 Jan 2003 14:53:06 -0800 > From: Kaitlin Duck Sherwood > To: spambayes@python.org > Subject: [Spambayes] egregious patents on anti-spam techniques > Message-ID: > Content-Type: text/plain; charset=us-ascii; format=flowed > MIME-Version: 1.0 > Content-Transfer-Encoding: 7BIT > Precedence: list > Message: 11 > > Gang -- > > I've recently become aware of two egregious patent applications > related to spam fighting. The first one looks like it might > conceivably cover Bayesian filtering. It would be good if someone > more familiar with Bayesian/classifier/machine learning theory could > check it out and perhaps challenge ("protest") the application. > > The second is on using whitelists, blacklists, challenge-response, > and digital signatures to combat spam. I plan to protest that one > myself. I have killer prior art for whitelists, blacklists, and > challenge-response (see p.82 of _Stopping Spam_ by Schwartz & > Garfinkel, 1998). I do not know of prior art for using digital > signatures in the service of stopping spam. If you know of prior art > for that, you might want to issue a protest and/or send me the info. > > (If you send me prior art on digital signatures/spam, please > + read the patent claims first > + put PRIOR ART in the subject line.) > > I'm going to Japan for ten days, leaving Friday morning, and will not > have email connectivity then. > > To protest a patent, you need to file prior art (within 60 days!) > with the patent office. See: > http://www.uspto.gov/web/offices/pac/mpep/documents/1900.htm > and > http://www.uspto.gov/web/offices/pac/mpep/documents/0600_610.htm#sect610 > > Patent application on adaptive spam filtering: > ahtml/PTO/search-bool.html&r=3&f=G&l=50&co1=AND&d=PG01&s1=email.TTL.&OS=TTL/em > ail&RS=TTL/email> > > > Patent application on whitelists, blacklists, challenge-response, and > digital signatures used in spam-fighting: > =1&u=/netahtml/PTO/srchnum.html&r=1&f=G&l=50&s1='20030009698'.PGNR.&OS=DN/20 > 030009698&RS=DN/20030009698> > > > > ------------------------------ > > _______________________________________________ > Spambayes mailing list > Spambayes@python.org > http://mail.python.org/mailman/listinfo/spambayes > > > End of Spambayes Digest, Vol 53, Issue 55 > ***************************************** > From neale at woozle.org Thu Jan 30 09:40:06 2003 From: neale at woozle.org (Neale Pickett) Date: Thu Jan 30 12:40:14 2003 Subject: [Spambayes] Alpha 2 Release? In-Reply-To: (Francois Granger's message of "Thu, 30 Jan 2003 09:42:33 +0100") References: <88gd3v4cj7v7qmdl28hofbj2ptih6hd13s@4ax.com> Message-ID: Francois Granger writes: > Side remark: > On MacOS X, upgrading from 2.2 to 2.3 changes the default database > format. IIRC, 2.3 includes db3. So "best" would change for you. Sadly, AAUI, there's no magic way to tell a dbhash from a db3hash. :/ > Can we add in the doc that the values for this option are: > "best", "db3hash", "dbhash", "gdbm", "dumbdbm" Maybe we need some document consolidation. :( Neale From jm at jmason.org Thu Jan 30 17:57:40 2003 From: jm at jmason.org (Justin Mason) Date: Thu Jan 30 12:56:42 2003 Subject: [Spambayes] Re: egregious patents on anti-spam techniques (Kaitlin Duck Sherwood) In-Reply-To: Message from Gary Robinson Message-ID: <20030130175745.B530B16F16@jmason.org> Gary Robinson said: > But I do have some experience with patents, and I do understand the > spambayes approach and the gist of their approach. It is my impression that > the patent does not have a scope that encompasses Graham-derived filters, > because they do not calculate "first" and "second" "symantic anchors" as the > term is used in Claim 1. > > They seem to be trying to make a straightforward adaptation of technology > that works well for classifying documents according to subject area, latent > semantic analysis, into the spam realm. That was my impression, too, which is good news (to a degree). The other one is much broader, and I've forwarded it onto the TMDA users list, since they are *totally* prior art. --j. From francois.granger at free.fr Thu Jan 30 18:57:49 2003 From: francois.granger at free.fr (Francois Granger) Date: Thu Jan 30 12:57:56 2003 Subject: [Spambayes] Alpha 2 Release? In-Reply-To: References: <88gd3v4cj7v7qmdl28hofbj2ptih6hd13s@4ax.com> Message-ID: At 09:40 -0800 30/01/2003, in message Re: [Spambayes] Alpha 2 Release?, Neale Pickett wrote: >Francois Granger writes: > >> Side remark: >> On MacOS X, upgrading from 2.2 to 2.3 changes the default database >> format. > >IIRC, 2.3 includes db3. So "best" would change for you. And then I can't access my existing one ;-) And I got from Robin Dunn that the current 2.3 build I use from wxPython sourceforge area does not have dbhash yet. So, I am stuck with 2.2 for my "production version" of Spambayes ;-) -- Recently using MacOSX....... From neale at woozle.org Thu Jan 30 10:03:29 2003 From: neale at woozle.org (Neale Pickett) Date: Thu Jan 30 13:03:42 2003 Subject: [Spambayes] Alpha 2 Release? In-Reply-To: (Francois Granger's message of "Thu, 30 Jan 2003 18:57:49 +0100") References: <88gd3v4cj7v7qmdl28hofbj2ptih6hd13s@4ax.com> Message-ID: Francois Granger writes: > At 09:40 -0800 30/01/2003, in message Re: [Spambayes] Alpha 2 Release?, > Neale Pickett wrote: >>IIRC, 2.3 includes db3. So "best" would change for you. > > And then I can't access my existing one ;-) Yeah, that's what I was intoning. I was answering the question Richie's asked later, I guess I should have actually answered his question instead of responding to yours ;) > And I got from Robin Dunn that the current 2.3 build I use from > wxPython sourceforge area does not have dbhash yet. So, I am stuck > with 2.2 for my "production version" of Spambayes ;-) Gah. I guess this should be in the release notes. So, has someone officially volunteered to be the documentation coordinator? It sounds like we need someone for the job... Neale From tim.one at comcast.net Thu Jan 30 15:29:41 2003 From: tim.one at comcast.net (Tim Peters) Date: Thu Jan 30 15:30:15 2003 Subject: [Spambayes] Outlook plugin notes In-Reply-To: Message-ID: [Meyer, Tony] > I guess the plugin could check to see if there was an existing 'junk > mail' folder and default to it [Francois Granger] > In this case, beware of localization issues. It may be translated in > localized versions. And it gets worse : any folder whatsoever can be the target of the "junk email" wizard, including Deleted Items. In addition, there's a distinct "adult content" rule, which may also target any folder. These work so poorly it's hard to believe anyone uses them for more than a few days. If someone is using them, I'd rather that Mark's plugin use a different directory, so we don't get blamed for the builtin filters' poor performance! From skip at pobox.com Thu Jan 30 14:41:32 2003 From: skip at pobox.com (Skip Montanaro) Date: Thu Jan 30 15:41:47 2003 Subject: [Spambayes] Alpha 2 Release? In-Reply-To: References: <88gd3v4cj7v7qmdl28hofbj2ptih6hd13s@4ax.com> Message-ID: <15929.36348.24505.247194@montanaro.dyndns.org> >>> Side remark: >>> On MacOS X, upgrading from 2.2 to 2.3 changes the default database >>> format. >> >> IIRC, 2.3 includes db3. So "best" would change for you. Francois> And then I can't access my existing one ;-) I'm skeptical that the 2.2 -> 2.3 change is what nailed your database. I've been using the HEAD branch of Python CVS as my daily Python interpreter since well before bsddb3 replaced the old bsddb module. I had no problems with database files at the transition point. More likely, what happened sometime between when you started using 2.2 and when you started using 2.3 is the the underlying Berkeley DB library got updated. Unfortunately, the file(1) command on Mac OS X doesn't grok that file format, so you have to be a bit sneaky to figure out what happened. On my system, /usr/bin/python is python2.2. Its bsddb.so file is in /usr/lib/python2.2/lib-dynload. "otool -L bsddb.so" tells me: bsddb.so: /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 60.0.0) This suggests that the Berkeley DB library is bundled in /usr/lib/libSystem. However, the bsddb.so for Python 2.3 is bsddb.so: /sw/lib/libdb-3.3.dylib (compatibility version 3.3.0, current version 3.3.11) /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 60.2.0) If you find the appropriate versions of bsddb.so for your two Python versions, so they disagree in this fashion about Berkeley DB? Sleepycat provided a fairly simple way out of the woods here. They provide db_dump and db_load commands with each version of their library. These tools transfer databases out of and back into the binary format (respectively) using a platform-neutral plain text format. You'd db_dump with the version of db_dump compatible with your old library, then db_load using the version compatible with your new library. Unfortunately, it appears Apple saw fit not to deliver them with Mac OS X. (I got them with the fink distribution, so only have libdb3.3 versions at the moment.) All is still not lost, however. Assuming you have both 2.2 and 2.3 available, you can try something like the following (untested!) code. Run this with Python 2.2: #!/usr/bin/python import bsddb db = bsddb.hashopen("hammie.db") f = open("hammie.txt", "w") for key in db.keys(): f.write('%s\n" % (key, db[k])) db.close() f.close() Run this with Python 2.3: #!/usr/local/bin/python2.3 import bsddb db = bsddb.hashopen("hammie.db.new", "c") for line in open("hammie.txt"): key, val = eval(line) db[key] = val db.close() Now, replace hammie.db file with hammie.db.new: mv hammie.db hammie.db.save chmod 444 hammie.db.save mv hammie.db.new hammie.db and see if Spambayes works for you using 2.3. Skip From francois.granger at free.fr Thu Jan 30 23:13:58 2003 From: francois.granger at free.fr (Francois Granger) Date: Thu Jan 30 17:14:04 2003 Subject: [Spambayes] Alpha 2 Release? In-Reply-To: <15929.36348.24505.247194@montanaro.dyndns.org> References: <88gd3v4cj7v7qmdl28hofbj2ptih6hd13s@4ax.com> <15929.36348.24505.247194@montanaro.dyndns.org> Message-ID: At 14:41 -0600 30/01/2003, in message Re: [Spambayes] Alpha 2 Release?, Skip Montanaro wrote: > >>> Side remark: > >>> On MacOS X, upgrading from 2.2 to 2.3 changes the default database > >>> format. > >> > >> IIRC, 2.3 includes db3. So "best" would change for you. > > Francois> And then I can't access my existing one ;-) > >I'm skeptical that the 2.2 -> 2.3 change is what nailed your database. I've >been using the HEAD branch of Python CVS as my daily Python interpreter >since well before bsddb3 replaced the old bsddb module. I had no problems >with database files at the transition point. Thanks for the time and all the details. I just installed today the Python 2.3a1 from the wxPython sf page: http://sf.net/project/showfiles.php?group_id=10718 Previously, I had spambayes running with the stock Apple Python 2.2 I had a confirmation from Robin Dunn who did the package that there was something wrong in it. >the fink distribution I downloaded really fex things from Fink. My only motivation to get it was to try X. Now that there is an Apple version, I don't use the fink one anymore. >All is still not lost, however. Assuming you have both 2.2 and 2.3 >available, you can try something like the following (untested!) code. I'll give it a try tomorrow. There is still another issue however with 2.3 and pop3proxy that I sent in another email: At 22:40 +0100 29/01/2003, in message Re: [Spambayes] Alpha 2 Release?, Francois Granger wrote: >At 12:38 -0800 29/01/2003, in message Re: [Spambayes] Alpha 2 >Release?, Neale Pickett wrote: >>Francois Granger writes: >> >>> UnicodeEncodeError: 'ascii' codec can't encode character '\ue9' in >>> position 86: ordinal not in range(128) Some more info: This error showed up since install of 2.3a1. It does not happens with my normal setup with Python 2.2 I removed all mail coded with accented chars and kept only english mails with no accented chars, no error. -- Recently using MacOSX....... From T.A.Meyer at massey.ac.nz Fri Jan 31 11:21:32 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Thu Jan 30 17:22:18 2003 Subject: [Spambayes] Outlook plugin notes Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1318D3F0@its-xchg4.massey.ac.nz> [Neale] > > * She says that the plugin is definitely not filtering > > public folders. [Tony] > This might be an issue that has been resolved (I'm not sure > what version the release had). I'm checking this out on my > system, but I don't have a lot of (mail) public folders > (they're mostly calendars). I'll have to wait until one of > them gets mail. What happens on my system is that I get an error because I do not have access to create a user-property in the public folder (the trace is below). So with no score recorded, no action can be taken. I suspect this might be what is happening for her as well, but I can't check what happens in a public folder where access is permitted, because I don't have access to one a folder like that - does anyone else? I'm not sure what could be done about this. Without access then the score can't be written into the message - in fact the message probably can't be moved either, since access wouldn't allow that. I suspect this is as it should be - only those with appropriate access to the folder should be spam filtering it. Neale, can you check with her and see what access she has to the public folder she was trying to filter? =Tony Meyer From skip at pobox.com Thu Jan 30 16:23:22 2003 From: skip at pobox.com (Skip Montanaro) Date: Thu Jan 30 17:23:35 2003 Subject: [Spambayes] Alpha 2 Release? In-Reply-To: References: <88gd3v4cj7v7qmdl28hofbj2ptih6hd13s@4ax.com> <15929.36348.24505.247194@montanaro.dyndns.org> Message-ID: <15929.42458.977826.228522@montanaro.dyndns.org> Francois> There is still another issue however with 2.3 and pop3proxy Francois> that I sent in another email: ... >>> UnicodeEncodeError: 'ascii' codec can't encode character '\ue9' in >>> position 86: ordinal not in range(128) Alas, Unicode I can't help you with... :-( Skip From neale at woozle.org Thu Jan 30 14:33:07 2003 From: neale at woozle.org (Neale Pickett) Date: Thu Jan 30 17:33:12 2003 Subject: [Spambayes] Outlook plugin notes In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F1318D3F0@its-xchg4.massey.ac.nz> ("Meyer, Tony"'s message of "Fri, 31 Jan 2003 11:21:32 +1300") References: <1ED4ECF91CDED24C8D012BCF2B034F1318D3F0@its-xchg4.massey.ac.nz> Message-ID: "Meyer, Tony" writes: > I'm not sure what could be done about this. Without access then the > score can't be written into the message - in fact the message probably > can't be moved either, since access wouldn't allow that. I suspect > this is as it should be - only those with appropriate access to the > folder should be spam filtering it. > > Neale, can you check with her and see what access she has to the > public folder she was trying to filter? I'll ask her, and post her answer to the list when I get it. Thanks for the interest :) Neale From rod at borderware.com Thu Jan 30 18:29:13 2003 From: rod at borderware.com (Rod Gilchrist) Date: Thu Jan 30 18:30:50 2003 Subject: [Spambayes] Re: egregious patents on anti-spam techniques (Kaitlin Duck Sherwood) In-Reply-To: References: Message-ID: <3E39B549.5020702@borderware.com> Gary Robinson wrote: >>Patent application on adaptive spam filtering: >>>ahtml/PTO/search-bool.html&r=3&f=G&l=50&co1=AND&d=PG01&s1=email.TTL.&OS=TTL/em >>ail&RS=TTL/email> >> >> > >I looked at this last night. > >I am not a lawyer, so don't go to the bank on what I say. And I didn't spend >a huge amount of time on it. > >But I do have some experience with patents, and I do understand the >spambayes approach and the gist of their approach. It is my impression that >the patent does not have a scope that encompasses Graham-derived filters, >because they do not calculate "first" and "second" "symantic anchors" as the >term is used in Claim 1. > > Here's a quote from the background section of the application: "Latent semantic analysis (LSA) is a method that automatically uncovers the salient semantic relationships between words and documents in a given corpus. Discrete words are mapped onto a continuous semantic vector space, in which clustering techniques may be applied." Graham derived filters do map words into a 'continuous semantic vector space', namely the one dimensional vector space of the range of [0.0, 1.0] of real numbers, and then 'clustering techniques' are applied. Normally clusters are defined by hyperplanes in N-Space, but in one dimesion they would be threshold values. The two 'symantic anchors' are arguably cluster centers located at 0.0 and 1.0 (also known as ham and spam in Graham-derived filters). In fact it is quite reasonable to describe a Graham-derived filter as having a 'ham anchor' that can be described as a location in N-Space in which each token string describes a dimension and the 'clue' value for that string is the location of the anchor in that dimension. Connecting the 'ham anchor' in N-Space with the 'spam anchor' in N'-Space with a normalized vector of unit length and positioning a hyperplane at some position along the vector and perpendicular to it (i.e. a threshold) is dead normal practice in 'clustering techniques'. I'd like to write this patent off too, but to me it looks like it likely would apply to Graham-derived filters. I'm not an expert in patents either, but I have a few issued ones of my own. The good news is the filing date is June 14, 2001. I'd like to suggest that it would be good to file a protest as Kaitlin suggested. There was certainly work done in this area before June 14, 2001. Does anyone have pointers they can pass along. - Rod Kaitlin Duck Sherwood wrote: > To protest a patent, you need to file prior art (within 60 days!) with the patent office. See: > http://www.uspto.gov/web/offices/pac/mpep/documents/1900.htm > and > http://www.uspto.gov/web/offices/pac/mpep/documents/0600_610.htm#sect610 > Patent application on adaptive spam filtering: > > Patent application on whitelists, blacklists, challenge-response, and digital signatures used in spam-fighting: >=1&u=/netahtml/PTO/srchnum.html&r=1&f=G&l=50&s1='20030009698'.PGNR.&OS=DN/20 > 030009698&RS=DN/20030009698> From richard at jowsey.com Fri Jan 31 10:29:55 2003 From: richard at jowsey.com (Richard Jowsey) Date: Thu Jan 30 18:31:03 2003 Subject: [Spambayes] Chi-square scoring In-Reply-To: References: <3E2EBE7C.638.52087A5@localhost> Message-ID: <3E3A5023.5670.59D1F8C@localhost> Hi again Gary, I've implemented your prob-combining technik and a chi-squared function in Java, and have run some very revealing tests. The first observation I'd make is that *any* measure of "spamminess" is only as good as the good/junk word databases. So I've done a fair amount of experimentation on ways to fine-tune my training corpus, especially wrt the careful quarantining of messages which are incorrectly classified, or are decidedly "unsure" and will probably remain so forever... Now, with a high-Q database, the probability distributions (pSpam) for the training corpus very closely approximate two binomial/normal distributions, with means around 0.25 and 0.75, and standard deviations of approx 1/12 (0.083), which is exactly what we'd expect from first principles, n'est-ce pas? In theory then, the 95%-confidence boundaries of an "unsure" zone (centered around pSpam=0.5) can be defined as pSpam falling between the 2-sigma points of the training distributions: Unsure lower limit: 0.25 + (2 * 1/12) = 0.417 Unsure upper limit: 0.75 - (2 * 1/12) = 0.583 In repeated testing, this simple approach provides reliable classification of randomly-selected streams of incoming email, viz. ~zero false positives and extremely accurate "uncertains". For comparison, I've also run the same streams through your chi- squared test, with (as you suggested) the null hypothesis being some normal distribution around 0.5, i.e. "I'm absolutely uncertain about anything". The outcomes are remarkably similar to my 2-sigma approach, but now the unsure zone is "stretched" logarithmically between chi-2 scores of ~0.15 and ~0.85. And yes, the same bunch of messages drop into the spam/unsure/ham regions, whichever scoring method is used. :-) Conclusions? After 1st-pass training, the good/junk word databases should definitely be re-tuned against the corpus. A low-Q database will simply "muddy" the classifier, irrespective of statistical technique. In such a poor signal/noise scenario, with lots of "unsures" in the corpus and/or in the sample stream, chi-2 scoring is a definite plus! However, this test is fairly expensive computationally, so in practice we might only need to perform chi-2 when a message's raw pSpam falls between, say, 0.25 and 0.75 (which approach gives exactly the same outcomes, but is considerably faster when proxying). I can post you testing logs depicting these various results if you're interested... Cheers, Richard From skip at pobox.com Thu Jan 30 18:21:03 2003 From: skip at pobox.com (Skip Montanaro) Date: Thu Jan 30 19:21:17 2003 Subject: [Spambayes] Re: egregious patents on anti-spam techniques (Kaitlin Duck Sherwood) In-Reply-To: <3E39B549.5020702@borderware.com> References: <3E39B549.5020702@borderware.com> Message-ID: <15929.49519.664913.836891@montanaro.dyndns.org> Rod> The good news is the filing date is June 14, 2001. Rod> I'd like to suggest that it would be good to file a protest as Rod> Kaitlin suggested. There was certainly work done in this area Rod> before June 14, 2001. Does anyone have pointers they can pass Rod> along. Check the list archives. There has been academic research in this area, though I don't know the reference off the top of my head. It's come up in one of these four places: * this list * at the recent spam workshop * on Gary Robinson's website * on Paul Graham's website It would probably be a good idea to collect a bibliography on the spambayes website, though I'm short on time at the moment. This is one of those things where a Wiki would be marvelous. Anthony, can MoinMoin be run on SF? If not, I'd be happy to create a new MoinMoin instance on manatee.mojam.com. Skip From noreply at sourceforge.net Thu Jan 30 14:21:15 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Thu Jan 30 19:47:04 2003 Subject: [Spambayes] [ spambayes-Bugs-677804 ] Untouched fitler command error Message-ID: Bugs item #677804, was opened at 2003-01-31 11:21 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=677804&group_id=61702 Category: Outlook Group: None Status: Open Resolution: None Priority: 5 Submitted By: Tony Meyer (anadelonbrin) Assigned to: Nobody/Anonymous (nobody) Summary: Untouched fitler command error Initial Comment: When filtering is set to leave uncertain/spam messages untouched (rather than copy/move), I get an error: Failed filtering message! Traceback (most recent call last): File "D:\CVS Modules\spambayes\Outlook2000 \filter.py", line 43, in filter_message raise RuntimeError, "Eeek - bad action '%r'" % (action,) RuntimeError: Eeek - bad action ''untouched'' Line 34 of filter.py seems to expect the action to start with 'no' ('none', perhaps?). Everything still works, but the traceback is a bit ugly. Changing that one line to expect 'un' should fix the problem. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=677804&group_id=61702 From noreply at sourceforge.net Thu Jan 30 15:21:30 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Thu Jan 30 19:47:12 2003 Subject: [Spambayes] [ spambayes-Bugs-677842 ] COM error on access denied Message-ID: Bugs item #677842, was opened at 2003-01-31 12:21 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=677842&group_id=61702 Category: Outlook Group: None Status: Open Resolution: None Priority: 5 Submitted By: Tony Meyer (anadelonbrin) Assigned to: Nobody/Anonymous (nobody) Summary: COM error on access denied Initial Comment: Some folders (public ones in particular) may not allow the user access to create the spam field. This also seems to cause an 'access denied' com error later on. An example traceback is below. Warning: failed to create the Outlook user-property in folder 'MCN Newsletter' (-2147352567, 'Exception occurred.', (4096, 'Microsoft Outlook', "You don't have appropriate permission to perform this operation.", None, 0, -2147024891), None) This is probably because the code has recently been changed, but it will have no effect on the filtering or scoring. AntiSpam: Watching for new messages in folder MCN Newsletter AntiSpam: Watching for new messages in folder Inbox AntiSpam: Watching for new messages in folder Spam Error processing missed messages! Traceback (most recent call last): File "D:\CVS Modules\spambayes\Outlook2000 \addin.py", line 610, in OnConnection self.ProcessMissedMessages() File "D:\CVS Modules\spambayes\Outlook2000 \addin.py", line 884, in ProcessMissedMessages File "D:\CVS Modules\spambayes\Outlook2000 \addin.py", line 129, in ProcessMessage if msgstore_message.GetField (manager.config.field_score_name) is not None: File "D:\CVS Modules\spambayes\Outlook2000 \msgstore.py", line 651, in GetField prop = self.mapi_object.GetIDsFromNames(props, 0) [0] com_error: (-2147024891, 'Access is denied.', None, None) ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=677842&group_id=61702 From T.A.Meyer at massey.ac.nz Fri Jan 31 13:49:31 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Thu Jan 30 19:50:08 2003 Subject: [Spambayes] Outlook plugin notes Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1318D3FE@its-xchg4.massey.ac.nz> [Mark] > Just so you know I am not ignoring this thread, I tend to > agree with many of > the points. My intention is to reply in detail as I fix them! Make sure you let us know if we can help. =Tony Meyer From tim at fourstonesExpressions.com Thu Jan 30 18:49:39 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Thu Jan 30 19:50:18 2003 Subject: [Spambayes] Skip's Installation.txt Message-ID: It seems as if INSTALLATION.TXT has a gob of info that needs to be integrated into the website... is this on anybody's radar? If someone has started doing this, then I don't want to reinvent it... c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org From jh at web.de Fri Jan 31 02:11:30 2003 From: jh at web.de (Juergen Hermann) Date: Thu Jan 30 20:12:14 2003 Subject: [Spambayes] Re: egregious patents on anti-spam techniques (Kaitlin Duck Sherwood) In-Reply-To: <15929.49519.664913.836891@montanaro.dyndns.org> Message-ID: On Thu, 30 Jan 2003 18:21:03 -0600, Skip Montanaro wrote: >It would probably be a good idea to collect a bibliography on the spambayes >website, though I'm short on time at the moment. This is one of those >things where a Wiki would be marvelous. Anthony, can MoinMoin be run on SF? >If not, I'd be happy to create a new MoinMoin instance on manatee.mojam.com. Not too safely, because sf's web setup requires you to have things world- writeable. Anyway, either the python.org wiki or the #python wiki are a good place to use, unless you expect Spambayes-related pages in the hundreds. Look at what Mike Rovner did for boost.python at PythonInfo: http://www.python.org/cgi-bin/moinmoin/boost_2epython Ciao, J?rgen From richard at jowsey.com Fri Jan 31 12:28:05 2003 From: richard at jowsey.com (Richard Jowsey) Date: Thu Jan 30 20:28:35 2003 Subject: [Spambayes] Bayesian virus detection? Message-ID: <3E3A6BD5.31975.6094EA5@localhost> I've "accidently" captured a handful of VB-script-type virii lately, since my spam training corpus apparently (luckily!) contained a few of these nasties. Got me thinking... I'd like to try training my classifier with a corpus of viral material, plus add a "virus" classification category into the mix, see what happens. However, I haven't a clue as to how to go about deliberately collecting such crud (and apparently, neither does Google)... Bloody weird question, but does anyone happen to have a collection of quarantined emails containing viral attachments, trojans, etc...? Which they could ZIP and send me for research purposes? Promise not to forward 'em to all my friends, ha ha... TIA, Richard PS: script kiddies and assorted black-hat hackers are officially invited to mail their nasty little efforts to richard@jowsey.com From tim at fourstonesExpressions.com Thu Jan 30 19:36:07 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Thu Jan 30 20:36:44 2003 Subject: [Spambayes] Bayesian virus detection? In-Reply-To: <3E3A6BD5.31975.6094EA5@localhost> Message-ID: <43A5PLLWTNLUQ4ZHFF00NJA5Y2USO.3e39d307@myst> I wonder if Neale's honeypot prog could be used for that... - TimS 1/30/2003 7:28:05 PM, "Richard Jowsey" wrote: >I've "accidently" captured a handful of VB-script-type virii >lately, since my spam training corpus apparently (luckily!) >contained a few of these nasties. Got me thinking... I'd like to >try training my classifier with a corpus of viral material, plus >add a "virus" classification category into the mix, see what >happens. > >However, I haven't a clue as to how to go about deliberately >collecting such crud (and apparently, neither does Google)... > >Bloody weird question, but does anyone happen to have a >collection of quarantined emails containing viral attachments, >trojans, etc...? Which they could ZIP and send me for research >purposes? > >Promise not to forward 'em to all my friends, ha ha... > >TIA, >Richard > >PS: script kiddies and assorted black-hat hackers are officially >invited to mail their nasty little efforts to richard@jowsey.com > > >_______________________________________________ >Spambayes mailing list >Spambayes@python.org >http://mail.python.org/mailman/listinfo/spambayes > > c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org From grobinson at transpose.com Thu Jan 30 22:30:15 2003 From: grobinson at transpose.com (Gary Robinson) Date: Thu Jan 30 22:30:46 2003 Subject: [Spambayes] Re: egregious patents on anti-spam techniques In-Reply-To: Message-ID: > > Graham derived filters do map words into a 'continuous semantic vector > space', namely the one dimensional vector > space of the range of [0.0, 1.0] of real numbers, and then 'clustering > techniques' are applied. Normally clusters are > defined by hyperplanes in N-Space, but in one dimesion they would be > threshold values. The two 'symantic anchors' are arguably cluster > centers located at 0.0 and 1.0 (also known as ham and spam in > Graham-derived filters). > I completely understand what you're saying, but having spend a lot of money and time researching the doctrine of equivalents and the various court rulings about it, I really, really think that's too much of a stretch. To be equivalent, the method has to perform "substantially the same function in substantially the same way" as the one in the patent. (Graver Tank & Mfg. Co. v. Linde Air Products, 339 U.S. 605 (1950)) The latter requirement is just not the case here, IMO. Moreover even if it were the case, many things can happen during the prosecution of a patent that make the doctrine of equivalents unavailable to the patent holder. In order to get a grasp on that in this specific case, we would have to look at the "file wrapper" which contains the history of the interaction with the patent office examiner. The wind in the courts has been blowing against the doctrine of equivalents for years. And without the DoE, you have to infringe exactly, and that is not the case here. > In fact it is quite reasonable to describe a Graham-derived filter as > having a 'ham anchor' that can > be described as a location in N-Space in which each token string > describes a dimension and the > 'clue' value for that string is the location of the anchor in that > dimension. Again, that is just way too much of a stretch to invoke the DoE IMO. There is no point in n-space that represents spam or ham in Graham. There is no location. We are reducing everything to one dimenension before we ever try to determine ham or spamminess, and being near 0 or 1 is not "substantially the same" as measuring a distance to a point in n-space. One thing I don't think we should do though is get into a nit-picking argument about this. It isn't worth it. :) The main thing is that I agree, prior art should be found if possible to lessen the chance that anybody is hassled with this. And of course there is always a random chance element when things actually get to trial. And getting things right can involve an expensive appeal process. I've read about very stupid decisions in the lower courts that were corrected on appeal. But the open-source community wouldn't easily be able to pay for such a long and expensive process. So it's certainly better to be safe than sorry. > I'd like to suggest that it would be good to file a protest as Kaitlin > suggested. There was certainly > work done in this area before June 14, 2001. Does anyone have pointers > they can pass along. This might help and couldn't hurt. As I read the claims and specification, the claims are specific to a particular methodology that is different from Graham and so I doubt that prior art that is graham-like will overturn the claims. However, the prior art may go into the prosecution history in such a way that it could greatly reduce any chance that anyone could try to use the patent to attack graham-based filters. So it would be worth doing if anyone can find some prior art. Gary From tim.one at comcast.net Thu Jan 30 23:19:18 2003 From: tim.one at comcast.net (Tim Peters) Date: Thu Jan 30 23:19:55 2003 Subject: [Spambayes] Re: egregious patents on anti-spam techniques (Kaitlin Duck Sherwood) In-Reply-To: <15929.49519.664913.836891@montanaro.dyndns.org> Message-ID: [Skip Montanaro] > It would probably be a good idea to collect a bibliography on the > spambayes website, though I'm short on time at the moment. This is one > of those things where a Wiki would be marvelous. Anthony, can MoinMoin > be run on SF? I'm not Anthony, but I doubt it. > If not, I'd be happy to create a new MoinMoin instance on > manatee.mojam.com. Note that Gary Robinson set up a Spam Wiki last year: http://wecanstopspam.org/jsp/Wiki?StartingPoints It's well done and well organized, but doesn't seem to have attracted a community yet. If I had something to say, I'd say it there . From anthony at interlink.com.au Fri Jan 31 15:33:30 2003 From: anthony at interlink.com.au (Anthony Baxter) Date: Thu Jan 30 23:35:44 2003 Subject: [Spambayes] Skip's Installation.txt In-Reply-To: Message-ID: <200301310433.h0V4XUR26793@localhost.localdomain> >>> Tim Stone - Four Stones Expressions wrote > It seems as if INSTALLATION.TXT has a gob of info that needs to be integrated > into the website... is this on anybody's radar? If someone has started doing > this, then I don't want to reinvent it... (I assume you mean "INTEGRATION.txt") I plan to do another reorganisation - this time, of the documentation - at the moment, this includes: Website "background.html" page. README.txt INTEGRATION.txt HAMMIE.txt Anyone have other docs that they want to throw into the documentation salad? It's "in progress" at the moment - I've not checked any of it into the website CVS repository, as it's got bits everywhere. I might make a branch or something, and figure out a way to put this on the website, in a way that's not the "default" website. >>> Neale Pickett wrote > So, has someone officially volunteered to be the documentation > coordinator? It sounds like we need someone for the job... I've been doing a lot of this, and I'm happy to continue doing so. Anthony From tim.one at comcast.net Fri Jan 31 00:10:39 2003 From: tim.one at comcast.net (Tim Peters) Date: Fri Jan 31 00:11:25 2003 Subject: [Spambayes] Outlook plugin notes In-Reply-To: Message-ID: > * All outbound mail should be trained as ham In connection with the Outlook client specifically, this one is especially dubious: Outlook doesn't use "normal" internet formats internally, and the headers on outgoing mail *as saved in* Sent Items are especially sparse. For example, this is the complete collection of headers on the last msg I sent to Python-Dev, as it exists in my Sent Items folder: """ X-Exchange-Message: true Subject: RE: [Python-Dev] Re: native code compiler? (or, OCaml vs. Python) To: python-dev@python.org """ That's it. No sender, date, from, errors-to, etc etc. The *lack* of such headers generates tokens in the default Outlook classifier, and if you don't train on outgoing msgs they become good spam clues. OTOH, if sent msgs were special-cased so that just the body got tokenized, it might be helpful for people who get very few msgs. But I expect that anyone using email enough to feel burdened by spam probably also gets more ham than they have time to deal with. A better trick may be to make a point of training-as-ham on received msgs that are *replied* to. From Paul.Moore at atosorigin.com Fri Jan 31 09:16:14 2003 From: Paul.Moore at atosorigin.com (Moore, Paul) Date: Fri Jan 31 04:16:48 2003 Subject: [Spambayes] Outlook plugin notes Message-ID: <16E1010E4581B049ABC51D4975CEDB886199C6@UKDCX001.uk.int.atosorigin.com> From: Tim Peters [mailto:tim.one@comcast.net] > But I expect that anyone using email enough to feel burdened > by spam probably also gets more ham than they have time to > deal with. I'm definitely a counterexample to that. At my home account, spam outweighs ham by factors of hundreds. But the address is published in a few places, enough that I don't want to abandon it. So spambayes routinely dumps 50-100 spam into my spam folder every day, and once every week or two leaves a ham message in my inbox. This is vital to me, as doing it manually pretty much guaranteed I'd miss something. My favourite example of that was the email from my sister, reminding me I owed her some money - "Subject: Debt!!!". Manually, I'd never even look at the body (and the sender address didn't register with me at first, either...) > A better trick may be to make a point of training-as-ham on > received msgs that are *replied* to. That would be interesting... Paul. From mhammond at skippinet.com.au Fri Jan 31 22:12:31 2003 From: mhammond at skippinet.com.au (Mark Hammond) Date: Fri Jan 31 06:13:27 2003 Subject: [Spambayes] Bayesian virus detection? In-Reply-To: <3E3A6BD5.31975.6094EA5@localhost> Message-ID: <001301c2c919$a6decd60$530f8490@eden> > I've "accidently" captured a handful of VB-script-type virii > lately, since my spam training corpus apparently (luckily!) > contained a few of these nasties. Got me thinking... I'd like to > try training my classifier with a corpus of viral material, plus > add a "virus" classification category into the mix, see what > happens. > > However, I haven't a clue as to how to go about deliberately > collecting such crud (and apparently, neither does Google)... I have a Python script that collects lots of "klez" and other "iframe vulnerability" variants. IIRC, these have their payload in some illegal HTML inside an iframe tag. My outlook (ie, late/patched versions) discards the illegal HTML, so the payload is lost. Lots of other useful stuff is also lost just due to the fact we are talking Outlook . In general though, these mails tend to have very standard or empty bodies, making our spambayes Outlook filter tend to put them in the "unsure" category. If I train enough of them, the filter does eventually get it correct, but I found a fairly trivial, stand-alone Python based filter works fine. I collect around 150 of these a day though if you want them. Mark. From skip at pobox.com Fri Jan 31 06:12:13 2003 From: skip at pobox.com (Skip Montanaro) Date: Fri Jan 31 07:12:20 2003 Subject: [Spambayes] Re: egregious patents on anti-spam techniques (Kaitlin Duck Sherwood) In-Reply-To: References: <15929.49519.664913.836891@montanaro.dyndns.org> Message-ID: <15930.26653.795305.227639@montanaro.dyndns.org> Tim> Note that Gary Robinson set up a Spam Wiki last year: Tim> http://wecanstopspam.org/jsp/Wiki?StartingPoints Tim> It's well done and well organized, but doesn't seem to have Tim> attracted a community yet. If I had something to say, I'd say it Tim> there . Good enough for me. Skip From skip at pobox.com Fri Jan 31 06:18:42 2003 From: skip at pobox.com (Skip Montanaro) Date: Fri Jan 31 07:18:49 2003 Subject: [Spambayes] Re: egregious patents on anti-spam techniques (Kaitlin Duck Sherwood) In-Reply-To: References: <15929.49519.664913.836891@montanaro.dyndns.org> Message-ID: <15930.27042.863961.390657@montanaro.dyndns.org> Tim> Note that Gary Robinson set up a Spam Wiki last year: Tim> http://wecanstopspam.org/jsp/Wiki?StartingPoints Added to the links on the related page. I checked it into CVS but can't "make install". Once Anthony refreshes the website it will be visible. Skip From papaDoc at videotron.ca Fri Jan 31 08:08:41 2003 From: papaDoc at videotron.ca (papaDoc) Date: Fri Jan 31 08:08:42 2003 Subject: [Spambayes] Skip's Installation.txt In-Reply-To: References: Message-ID: <3E3A7559.7000007@videotron.ca> Hi, >It seems as if INSTALLATION.TXT has a gob of info that needs to be integrated >into the website... is this on anybody's radar? If someone has started doing >this, then I don't want to reinvent it... > I have an html file describing how to use pop3proxy and mozilla and some note on how to use it with ??? on Mac OS X from Francois Granger. But I need to update this for the new pop3proxy (i.e. had a part when a mail client cannot specify different ports.) Remi Ricard From tim at fourstonesExpressions.com Fri Jan 31 07:12:08 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Fri Jan 31 08:12:47 2003 Subject: [Spambayes] Skip's Installation.txt In-Reply-To: <200301310433.h0V4XUR26793@localhost.localdomain> Message-ID: 1/30/2003 10:33:30 PM, Anthony Baxter wrote: > >>>> Tim Stone - Four Stones Expressions wrote >> It seems as if INSTALLATION.TXT has a gob of info that needs to be integrated >> into the website... is this on anybody's radar? If someone has started doing >> this, then I don't want to reinvent it... > >(I assume you mean "INTEGRATION.txt") Yeah. winduhs has dulled my pattern recognition circuits > >I plan to do another reorganisation - this time, of the documentation - >at the moment, this includes: > >Website "background.html" page. >README.txt >INTEGRATION.txt >HAMMIE.txt > >Anyone have other docs that they want to throw into the documentation >salad? > >It's "in progress" at the moment - I've not checked any of it into the >website CVS repository, as it's got bits everywhere. I might make a >branch or something, and figure out a way to put this on the website, >in a way that's not the "default" website. Let me know if there's anything I can do to help - TimS > >>>> Neale Pickett wrote >> So, has someone officially volunteered to be the documentation >> coordinator? It sounds like we need someone for the job... > >I've been doing a lot of this, and I'm happy to continue doing so. > >Anthony > >_______________________________________________ >Spambayes mailing list >Spambayes@python.org >http://mail.python.org/mailman/listinfo/spambayes > > c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org From skip at pobox.com Fri Jan 31 07:23:37 2003 From: skip at pobox.com (Skip Montanaro) Date: Fri Jan 31 08:23:48 2003 Subject: [Spambayes] Bayesian virus detection? In-Reply-To: <001301c2c919$a6decd60$530f8490@eden> References: <3E3A6BD5.31975.6094EA5@localhost> <001301c2c919$a6decd60$530f8490@eden> Message-ID: <15930.30937.343093.173755@montanaro.dyndns.org> Mark> In general though, these mails tend to have very standard or empty Mark> bodies, making our spambayes Outlook filter tend to put them in Mark> the "unsure" category. Maybe *your* spambayes Outlook filter can't tell, but *my* spambayes hammie filter does just fine with them, thank you very much. ;-) At any rate, after having a chance to ponder a few such messages, spambayes seems to just vacuum them right up. Skip From mourad at aquazul.com Fri Jan 31 08:26:12 2003 From: mourad at aquazul.com (Mourad De Clerck) Date: Fri Jan 31 10:13:11 2003 Subject: [Spambayes] use as generic classifier? (not just spam/ham) Message-ID: <3E3A2514.4000003@aquazul.com> (please cc: any replies - I'm not subscribed. thx.) Hi, This usage is probably rather obvious, but I haven't seen any discussion about it, so I'm asking... I was wondering if spambayes could be used in a different way. Currently I have a couple of users on a mailserver (imap+maildir). I'd like to make it possible to have them create different subfolders where they move mail to (from a mailinglist, about a specific subject, from a specific person - whatever) through their standard imap client. Now, when a new mail arrives on the mailserver, I'd like that mail to be "scored" against all these folders and subfolders (maildirs), and for it to be delivered to the folder that corresponds to the system's "best guess". If the scores for all folders are within a certain (smallish) range (percentage), the system's best guess is not good enough and it should just be delivered to the standard inbox. For the user, I believe this would be extremely convenient, instead of having to do rule-based filtering on the client side, or using something like procmail. Just by organising your mail like you normally would in any case, your new incoming mail would be matched automatically to the right folder. Is this possible with spambayes? -- Mourad From neale at woozle.org Fri Jan 31 08:44:26 2003 From: neale at woozle.org (Neale Pickett) Date: Fri Jan 31 11:44:38 2003 Subject: [Spambayes] Bayesian virus detection? In-Reply-To: <43A5PLLWTNLUQ4ZHFF00NJA5Y2USO.3e39d307@myst> (Tim Stone - Four Stones Expressions's message of "Thu, 30 Jan 2003 19:36:07 -0600") References: <43A5PLLWTNLUQ4ZHFF00NJA5Y2USO.3e39d307@myst> Message-ID: Tim Stone - Four Stones Expressions writes: > [about collecting worms] > I wonder if Neale's honeypot prog could be used for that... - TimS Not yet. All the worms I know about scan your address book. It would take one that tries to send mail to random IPs for my honeypot to catch it. Speaking of which, just as Matt Sargeant predicted, the honeypot isn't getting hit as much as I thought it would: http://woozle.org/~spam/stats.cgi Neale PS: Klez and its ilk are properly called worms, not viruses. Not that anyone would fail to understand what you meant if you said virus, I guess. PPS: The plural of "virus" is "viruses". "viri" means "men". "virii" doesn't mean anything at all. There are other latin -us latin words which are not pluralized by going to -i (notably, corpus). From msergeant at startechgroup.co.uk Fri Jan 31 17:07:34 2003 From: msergeant at startechgroup.co.uk (Matt Sergeant) Date: Fri Jan 31 12:07:38 2003 Subject: [Spambayes] Bayesian virus detection? In-Reply-To: Message-ID: <7DA029EF-353E-11D7-9648-0003939CB5D8@startechgroup.co.uk> On Friday, Jan 31, 2003, at 16:44 Europe/London, Neale Pickett wrote: > PS: Klez and its ilk are properly called worms, not viruses. Not that > anyone would fail to understand what you meant if you said virus, I > guess. Worms are viruses. As are Trojans and a whole bunch of other things. Virus is the generic term for a program that spreads itself by one means or another. The more specific term (e.g. worm) indicates the means with which it spreads. From neale at woozle.org Fri Jan 31 09:33:05 2003 From: neale at woozle.org (Neale Pickett) Date: Fri Jan 31 12:33:13 2003 Subject: [Spambayes] OT: pedantism (was: Bayesian virus detection?) In-Reply-To: <7DA029EF-353E-11D7-9648-0003939CB5D8@startechgroup.co.uk> (Matt Sergeant's message of "Fri, 31 Jan 2003 17:07:34 +0000") References: <7DA029EF-353E-11D7-9648-0003939CB5D8@startechgroup.co.uk> Message-ID: Matt Sergeant writes: > On Friday, Jan 31, 2003, at 16:44 Europe/London, Neale Pickett wrote: > >> PS: Klez and its ilk are properly called worms, not viruses. Not that >> anyone would fail to understand what you meant if you said virus, I >> guess. > > Worms are viruses. As are Trojans and a whole bunch of other > things. Virus is the generic term for a program that spreads itself by > one means or another. The more specific term (e.g. worm) indicates the > means with which it spreads. Hmm, not according to my dictionary: http://catb.org/esr/jargon/html/entry/virus.html Although it does appear that I was incorrect in characterising Klez as a worm--it would be, according to esr, a virus. Every report I've seen on Klez calls it a "worm". So maybe the larger point is that there's no longer a clear consensus on what is a worm and what is a virus, and I should go back to merging mboxtrain and hammiebulk. ;) Neale From neale at woozle.org Fri Jan 31 09:42:54 2003 From: neale at woozle.org (Neale Pickett) Date: Fri Jan 31 12:42:58 2003 Subject: [Spambayes] use as generic classifier? (not just spam/ham) In-Reply-To: <3E3A2514.4000003@aquazul.com> (Mourad De Clerck's message of "Fri, 31 Jan 2003 08:26:12 +0100") References: <3E3A2514.4000003@aquazul.com> Message-ID: Mourad De Clerck writes: > when a new mail arrives on the mailserver, I'd like that mail to be > "scored" against all these folders and subfolders (maildirs), and for it > to be delivered to the folder that corresponds to the system's "best > guess". If the scores for all folders are within a certain (smallish) > range (percentage), the system's best guess is not good enough and it > should just be delivered to the standard inbox. While I haven't used it myself, I think you just described ifile: http://www.nongnu.org/ifile/ I don't there's nothing stopping you from hacking spambayes to do n-way classification, although I don't compeletely understand all the math yet, so I couldn't say for sure. If you can get spambayes to do this trick, I at least would like to try folding this in to our sources. But I think it would be difficult, since currently everything about spambayes is geared to having two categories. Neale From acapnotic at users.sourceforge.net Fri Jan 31 09:43:44 2003 From: acapnotic at users.sourceforge.net (Kevin Turner) Date: Fri Jan 31 12:43:50 2003 Subject: [Spambayes] Ximian Evolution Message-ID: <1044035023.21121.93.camel@troglodyte.funhouse> [apologies if you see this twice; sent it from the wrong address the first time and it stuck in the moderator queue.] A search of the list doesn't turn anything up, so I guess this is a new thread. Has anyone given any thought to how to use spambayes with the Evolution MUA without procmail or proxy? I just came up with something that sounds plausible; I haven't coded it yet. Current versions of Evolution 1.2 can filter messages based on the exit code of a process you pipe them to. Or you may be able to chain filters, first piping it through an action to add the spambayes header, then filtering on that.[1] Either way, filtering shouldn't be hard. For trailing, Evolution seems to lack a way to script it, but you could copy messages to folders[2] and run hammie on them at your leisure or via a cron job. (Does hammie move messages out of this mailbox or flag them as trained in some way so it doesn't double-train them?) It would be nice to avoid process-per-message overhead. Maybe bonobo holds some wonderful answer to that, I don't know. For now, I'm probably willing to not worry about it. Has anyone set this up? Can we add a section about it in the integration docs? That'd be super. Thanks, - Kevin [1] The filter-on-header method has the advantage of being easier to trace and debug, I only hesitate because I haven't convinced myself I know what order Evolution applies filters in and the conditions in which it chains them. [2] These could be in ~/evolution/local, or configured as a separate account of prococol "mbox" or "Maildir"; whichever option rankles evolution less, considering we may modify the contents of the mailbox without its knowledge. I'm guessing the "seperate account" option is probably better for that. -- The moon is new, 1.2% illuminated, 28.5 days old. From grobinson at transpose.com Fri Jan 31 12:45:04 2003 From: grobinson at transpose.com (Gary Robinson) Date: Fri Jan 31 12:45:03 2003 Subject: [Spambayes] Re: use as generic classifier? (not just spam/ham) Message-ID: I've been thinking about that myself. I wonder if anyone else on this list has been? There are traditional classification techniques for classifying documents. Each document becomes a vector in n-space, and space is partitioned according to subject area. But that technology is typically complex and expensive. it would be interesting to take the whole spambayes spam/ham approach, and run it separately for every folder. Just do the same exact thing, but instead of the target being a junk mail folder, it would be a folder of some subject matter. So for each folder, instead of spam vs. nonspam, vs. would be calculated. Then, the email would be allocated to the folder with the greatest certainty would be picked. That is, I wouldn't try to adapt the approach to work with more than a binary choice at a time. I don't think that would work (at least not with chi-square). But if you calculate the binary choice w/r/t each folder, the one with the most certainty can be chosen. It's a little bit ugly, if it gets the job done, so what. I think it would work. It might not work quite as well as in the spam/ham case, but it would be interesting to actually try it and see how well it did. It seems like the mods to spambayes to make it happen wouldn't be super-huge; if enough people felt that there might be some benefit, it seems like it might be worth doing... Just my 2 cents! > > I was wondering if spambayes could be used in a different way. Currently > I have a couple of users on a mailserver (imap+maildir). I'd like to > make it possible to have them create different subfolders where they > move mail to (from a mailinglist, about a specific subject, from a > specific person - whatever) through their standard imap client. Now, > when a new mail arrives on the mailserver, I'd like that mail to be > "scored" against all these folders and subfolders (maildirs), and for it > to be delivered to the folder that corresponds to the system's "best > guess". If the scores for all folders are within a certain (smallish) > range (percentage), the system's best guess is not good enough and it > should just be delivered to the standard inbox. --Gary -- [http://ThisURLEnablesEmailToGetThroughOverzealousSpamFilters.org] Gary Robinson CEO Transpose, LLC grobinson@transpose.com 207-942-3463 http://www.transpose.com http://radio.weblogs.com/0101454 From francois.granger at free.fr Fri Jan 31 18:45:10 2003 From: francois.granger at free.fr (Francois Granger) Date: Fri Jan 31 12:45:16 2003 Subject: [Spambayes] Well done Message-ID: Spambayes is really astonishing. It flagged this mail with really few clues. Following, mail then clues ======================================== Return-Path: Delivered-To: pbarn@altern.org Received: (qmail 2141 invoked by alias); 31 Jan 2003 15:51:04 -0000 Received: from unknown (HELO aol.com) (213.42.188.253) by altern.org with SMTP; 31 Jan 2003 15:51:04 -0000 Message-ID: <000810a4ed04$baa75782$67542232@qdsbgnj.aye> From: To: Cc: , , , , , , , , , , , , Subject: *lock in your 4 pct rate 123 2633BvYK4-226sJyQ3-17 Date: Fri, 31 Jan 2003 10:25:26 +0500 MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="----=_NextPart_000_00C1_74D58D0A.C2060E16" X-Priority: 3 X-Mailer: Microsoft Outlook, Build 10.0.2616 Importance: Normal Status: X-Spambayes-Classification: spam % Pftjyealjgultwb ::M::

no more mail?

X5P5lphu0ycTA7e7H3Na 2129PNyE3-553ZPfu8938Kml22
======================================== Spam probability: 0.983622180161 *H* 0.00646291040099 *S* 0.973707270724 header:MIME-Version:1 0.298172028413 header:Return-Path:1 0.376864012565 header:Message-ID:1 0.385762811067 header:Importance:1 0.687752526214 content-type:multipart/mixed 0.714739667627 subject:123 0.844827586207 header:Received:2 0.852656659762 from:no real name:2**0 0.881348131394 subject:your 0.934782608696 x-mailer:microsoft outlook, build 10.0.2616 0.934782608696 from:addr:aol.com 0.95871559633 ======================================== -- Recently using MacOSX....... From tim at fourstonesExpressions.com Fri Jan 31 11:45:11 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Fri Jan 31 12:45:50 2003 Subject: [Spambayes] OT: pedantism (was: Bayesian virus detection?) In-Reply-To: Message-ID: 1/31/2003 11:33:05 AM, Neale Pickett wrote: >Matt Sergeant writes: > >> On Friday, Jan 31, 2003, at 16:44 Europe/London, Neale Pickett wrote: >> >>> PS: Klez and its ilk are properly called worms, not viruses. Not that >>> anyone would fail to understand what you meant if you said virus, I >>> guess. >> >> Worms are viruses. As are Trojans and a whole bunch of other >> things. Virus is the generic term for a program that spreads itself by >> one means or another. The more specific term (e.g. worm) indicates the >> means with which it spreads. > >Hmm, not according to my dictionary: > > http://catb.org/esr/jargon/html/entry/virus.html > >Although it does appear that I was incorrect in characterising Klez as a >worm--it would be, according to esr, a virus. > >Every report I've seen on Klez calls it a "worm". So maybe the larger >point is that there's no longer a clear consensus on what is a worm and >what is a virus, and I should go back to merging mboxtrain and >hammiebulk. ;) I think that while you're at it, we should refactor the Corpus stuff, so that messages and databases and training and classifying are all handled in exactly one place in the system. Richie has this idea of a 'spambayes server' which is the heart and soul of the systems, and that all the user facing stuff fronts.... what say you? - TimS > >Neale > >_______________________________________________ >Spambayes mailing list >Spambayes@python.org >http://mail.python.org/mailman/listinfo/spambayes > > c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org From acapnotic at foobox.net Fri Jan 31 09:39:28 2003 From: acapnotic at foobox.net (Kevin Turner) Date: Fri Jan 31 13:00:22 2003 Subject: [Spambayes] Ximian Evolution Message-ID: <1044034766.21120.88.camel@troglodyte.funhouse> A search of the list doesn't turn anything up, so I guess this is a new thread. Has anyone given any thought to how to use spambayes with the Evolution MUA without procmail or proxy? I just came up with something that sounds plausible; I haven't coded it yet. Current versions of Evolution 1.2 can filter messages based on the exit code of a process you pipe them to. Or you may be able to chain filters, first piping it through an action to add the spambayes header, then filtering on that.[1] Either way, filtering shouldn't be hard. For trailing, Evolution seems to lack a way to script it, but you could copy messages to folders[2] and run hammie on them at your leisure or via a cron job. (Does hammie move messages out of this mailbox or flag them as trained in some way so it doesn't double-train them?) It would be nice to avoid process-per-message overhead. Maybe bonobo holds some wonderful answer to that, I don't know. For now, I'm probably willing to not worry about it. Has anyone set this up? Can we add a section about it in the integration docs? That'd be super. Thanks, - Kevin [1] The filter-on-header method has the advantage of being easier to trace and debug, I only hesitate because I haven't convinced myself I know what order Evolution applies filters in and the conditions in which it chains them. [2] These could be in ~/evolution/local, or configured as a separate account of prococol "mbox" or "Maildir"; whichever option rankles evolution less, considering we may modify the contents of the mailbox without its knowledge. I'm guessing the "seperate account" option is probably better for that. -- The moon is new, 1.2% illuminated, 28.5 days old. From francois.granger at free.fr Fri Jan 31 19:04:03 2003 From: francois.granger at free.fr (Francois Granger) Date: Fri Jan 31 13:04:21 2003 Subject: [Spambayes] This error is new to me Message-ID: I cut and past a message from Eudora to pop3proxy then click on classify. I got this trace back. I redid it another time to be sure, same trace back. I loaded it from the file in the unknown folder, same result. The only "wrong" & I could find was in this tag: I can send the mail on request. 500 Server error Traceback (most recent call last): File "/Volumes/OS99/spambayes/spambayes/Dibbler.py", line 398, in found_terminator getattr(plugin, name)(**params) File "/Volumes/OS99/spambayes/pop3proxy.py", line 967, in onClassify cluesTable += cluesRow % (word, wordProb) File "/Volumes/OS99/spambayes/spambayes/PyMeldLite.py", line 875, in __mod__ self._replaceNodeContent(element, sequence.pop()) File "/Volumes/OS99/spambayes/spambayes/PyMeldLite.py", line 654, in _replaceNodeContent node.children = self._nodeListFromSource(value) File "/Volumes/OS99/spambayes/spambayes/PyMeldLite.py", line 640, in _nodeListFromSource tree = _generateTree(""+value+"") File "/Volumes/OS99/spambayes/spambayes/PyMeldLite.py", line 575, in _generateTree g.close() File "/BinaryCache/python/python-3.root~193/usr/lib/python2.2/xmllib.py", line 172, in close self.goahead(1) File "/BinaryCache/python/python-3.root~193/usr/lib/python2.2/xmllib.py", line 405, in goahead self.syntax_error("bogus `%s'" % data) File "/BinaryCache/python/python-3.root~193/usr/lib/python2.2/xmllib.py", line 794, in syntax_error raise Error('Syntax error at line %d: %s' % (self.lineno, message)) Error: Syntax error at line 1: bogus `&' -- Recently using MacOSX....... From neale at woozle.org Fri Jan 31 10:18:49 2003 From: neale at woozle.org (Neale Pickett) Date: Fri Jan 31 13:18:57 2003 Subject: [Spambayes] Ximian Evolution In-Reply-To: <1044035023.21121.93.camel@troglodyte.funhouse> (Kevin Turner's message of "31 Jan 2003 09:43:44 -0800") References: <1044035023.21121.93.camel@troglodyte.funhouse> Message-ID: Kevin Turner writes: > I just came up with something that sounds plausible; I haven't coded it > yet. Current versions of Evolution 1.2 can filter messages based on the > exit code of a process you pipe them to. Or you may be able to chain > filters, first piping it through an action to add the spambayes header, > then filtering on that.[1] Either way, filtering shouldn't be hard. Okay, that would be easy enough to add as an option to hammiefilter. What exit codes mean what? Even better would be if you could run everything through hammiefilter -t, and use the stdout output as the message. Then you could filter on header. > For trailing, Evolution seems to lack a way to script it, but you could > copy messages to folders[2] and run hammie on them at your leisure or > via a cron job. (Does hammie move messages out of this mailbox or flag > them as trained in some way so it doesn't double-train them?) See HAMMIE.txt for an explanation of how to do this. But I might need to add a new Evolution folder type if it don't use MH, Maildir, or mbox mail spools internally. > It would be nice to avoid process-per-message overhead. Maybe bonobo > holds some wonderful answer to that, I don't know. For now, I'm > probably willing to not worry about it. I've only used Evolution once or twice, but it seems to me that their whole gig is to get you to write plugins for everything. So a bonobo (or whatever their object broker is called) component would be the Right Thing. If we could make it as snazzy as the Outlook plugin, that'd be even Righter. Neale From richie at entrian.com Thu Jan 30 18:35:40 2003 From: richie at entrian.com (Richie Hindle) Date: Fri Jan 31 13:36:35 2003 Subject: [Spambayes] seg faults? In-Reply-To: <15926.65353.991418.385713@montanaro.dyndns.org> References: <20030127135919.A15195@discworld.dyndns.org> <15926.65353.991418.385713@montanaro.dyndns.org> Message-ID: <44ri3v8sis31b5gu9uj06j749lgse3kndg@4ax.com> [Richie] > Can a process increase its own stack size? [Skip] > Here's the relevant code from Lib/test/regrtest.py: Wonderful, many thanks. I've integrated this into pop3proxy.py. We should probably go with Tony's idea of having a module for this kind of platform-dependent stuff eventually. -- Richie Hindle richie@entrian.com From richie at entrian.com Thu Jan 30 18:35:38 2003 From: richie at entrian.com (Richie Hindle) Date: Fri Jan 31 13:36:45 2003 Subject: [Spambayes] Alpha 2 Release? In-Reply-To: References: <88gd3v4cj7v7qmdl28hofbj2ptih6hd13s@4ax.com> Message-ID: <0spi3v403pm760r3v19fg44tq5ei6heqlq@4ax.com> [Francois] > UnicodeEncodeError: 'ascii' codec can't encode character '\ue9' in > position 86: ordinal not in range(128) This is fixed (I hope!). You can go back to being "Fran?ois" now. 8-) -- Richie Hindle richie@entrian.com From richie at entrian.com Thu Jan 30 18:35:43 2003 From: richie at entrian.com (Richie Hindle) Date: Fri Jan 31 13:36:56 2003 Subject: [Spambayes] This error is new to me In-Reply-To: References: Message-ID: [Fran?ois] > I got this trace back. > [...] > I can send the mail on request. Yes please - could you zip up the file from the unknown folder and send me the zip file? Many thanks. And if it's not too much hassle, could you update your software and see whether my fix to your accented-characters problem with 2.3 is properly fixed? I'm pretty sure I've fixed it, but since I'm about to build the Alpha 2 release I'd love to have independent confirmation. Not to worry if you don't have the time. -- Richie Hindle richie@entrian.com From richie at entrian.com Thu Jan 30 19:13:31 2003 From: richie at entrian.com (Richie Hindle) Date: Fri Jan 31 14:14:25 2003 Subject: [Spambayes] This error is new to me In-Reply-To: References: Message-ID: <29ui3v48b8u3htnnbctd17s93g28egd5a4@4ax.com> [Fran?ois] > I got this trace back. Now fixed. -- Richie Hindle richie@entrian.com From neale at woozle.org Fri Jan 31 11:31:44 2003 From: neale at woozle.org (Neale Pickett) Date: Fri Jan 31 14:31:51 2003 Subject: [Spambayes] Re: egregious patents on anti-spam techniques (Kaitlin Duck Sherwood) In-Reply-To: (Tim Peters's message of "Thu, 30 Jan 2003 23:19:18 -0500") References: Message-ID: Tim Peters writes: > Note that Gary Robinson set up a Spam Wiki last year: > > http://wecanstopspam.org/jsp/Wiki?StartingPoints Sweet, I forgot about that. I'm updating it with what I've learned so far from running spampot. Neale From richie at entrian.com Thu Jan 30 20:33:22 2003 From: richie at entrian.com (Richie Hindle) Date: Fri Jan 31 15:34:19 2003 Subject: [Spambayes] Alpha2 Pre-release Message-ID: I've built an alpha2 source release of Spambayes. Before we put it up on the main web site, I'd feel a lot better if someone could smoke-test it for me - I may have made some horrible mistake that I'm too close to see... I've put it here: http://entrian.com/spambayes/spambayes-1.0a2-pre.zip http://entrian.com/spambayes/spambayes-1.0a2-pre.tar.gz For POP3 proxy users, this release should be GUI out of the box - install it, run pop3proxy.py, point your browser at the URL, go to the Config page and enter your POP3 server details, change your email client to point at the proxy, and you're away - messages are classfied and you can train through the web. For hammie users there's Neale's new muttrc and spambayes.el, and Skip's proxytee lets hammie users train through the web interface. Tim Stone's import/export script should make upgrading easy, for now and in the future. Assorted improvements to the tokeniser and classifier make spambayes even more accurate. What else has changed? We should do a proper release announcement - I don't keep up with the Outlook plug-in, so what's new there? Who've I offended by forgetting about their fantastic new feature? 8-) One question for those who know about these things: I originally built the release on Windows, but then realised that all the source files in both the zip and tar.gz archives had Windows line-endings. People installing and editing on unix would see '^M's all over the place (possibly, depending on their editor). Is there a distutils option I've missed to prevent this? Anyway, I rebuilt the archives on unix (thanks Neale!). -- Richie Hindle richie@entrian.com From tim at fourstonesExpressions.com Fri Jan 31 14:39:57 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Fri Jan 31 15:40:37 2003 Subject: [Spambayes] Alpha2 Pre-release In-Reply-To: Message-ID: <2W75GCLI31NON1WNKGFECA8051KHKF.3e3adf1d@myst> 1/30/2003 2:33:22 PM, Richie Hindle wrote: > >I've built an alpha2 source release of Spambayes. Before we put it up on >the main web site, I'd feel a lot better if someone could smoke-test it for >me - I may have made some horrible mistake that I'm too close to see... > >I've put it here: > > http://entrian.com/spambayes/spambayes-1.0a2-pre.zip > http://entrian.com/spambayes/spambayes-1.0a2-pre.tar.gz > >For POP3 proxy users, this release should be GUI out of the box - install >it, run pop3proxy.py, point your browser at the URL, go to the Config page >and enter your POP3 server details, change your email client to point at >the proxy, and you're away - messages are classfied and you can train >through the web. > >For hammie users there's Neale's new muttrc and spambayes.el, and Skip's >proxytee lets hammie users train through the web interface. Tim Stone's >import/export script should make upgrading easy, for now and in the future. The operative word there is 'should'. Please back your database up before migrating it, until we know for sure there aren't bugs in the script. - TimS >Assorted improvements to the tokeniser and classifier make spambayes even >more accurate. > >What else has changed? We should do a proper release announcement - I >don't keep up with the Outlook plug-in, so what's new there? Who've I >offended by forgetting about their fantastic new feature? 8-) > >One question for those who know about these things: I originally built the >release on Windows, but then realised that all the source files in both the >zip and tar.gz archives had Windows line-endings. People installing and >editing on unix would see '^M's all over the place (possibly, depending on >their editor). Is there a distutils option I've missed to prevent this? > >Anyway, I rebuilt the archives on unix (thanks Neale!). > >-- >Richie Hindle >richie@entrian.com > > >_______________________________________________ >Spambayes mailing list >Spambayes@python.org >http://mail.python.org/mailman/listinfo/spambayes > > c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org From francois.granger at free.fr Fri Jan 31 23:40:03 2003 From: francois.granger at free.fr (Francois Granger) Date: Fri Jan 31 17:40:08 2003 Subject: [Spambayes] A question Message-ID: I recently received a spam properly classified as spam. I copy and past it content from Eudora in pop3proxy and click Classify. It give me a Spam probability: 0.887897331413. I check my bayescustomize.ini where there is : [Categorization] ham_cutoff = 0.10 spam_cutoff = 0.90 So, these parameters are not used by pop3proxy ? -- Recently using MacOSX....... From tony-bayes at lownds.com Fri Jan 31 14:54:16 2003 From: tony-bayes at lownds.com (Tony Lownds) Date: Fri Jan 31 17:54:16 2003 Subject: [Spambayes] A question In-Reply-To: References: Message-ID: At 11:40 PM +0100 1/31/03, Francois Granger wrote: >I recently received a spam properly classified as spam. I copy and >past it content from Eudora in pop3proxy and click Classify. It give >me a Spam probability: 0.887897331413. I check my bayescustomize.ini >where there is : > >[Categorization] >ham_cutoff = 0.10 >spam_cutoff = 0.90 > >So, these parameters are not used by pop3proxy ? Hi Fran?ois, Eudora does always keep the content 100% the same as what pop3proxy sees. For instance, attachment data is removed when you copy/paste. Also, you will see a subset of the headers unless you click the rich headers button. This might explain the discrepancy. -Tony