From francois.granger at free.fr Sat Feb 1 12:15:11 2003 From: francois.granger at free.fr (Francois Granger) Date: Sat Feb 1 06:15:16 2003 Subject: [Spambayes] A question In-Reply-To: References: Message-ID: At 14:54 -0800 31/01/2003, in message Re: [Spambayes] A question, Tony Lownds wrote: >At 11:40 PM +0100 1/31/03, Francois Granger wrote: >>I recently received a spam properly classified as spam. I copy and >>past it content from Eudora in pop3proxy and click Classify. It give >>me a Spam probability: 0.887897331413. I check my bayescustomize.ini >>where there is : >> >>[Categorization] >>ham_cutoff = 0.10 >>spam_cutoff = 0.90 >> >>So, these parameters are not used by pop3proxy ? > >Hi Fran?ois, > >Eudora does always keep the content 100% the same as what pop3proxy >sees. For instance, attachment data is removed when you copy/paste. >Also, you will see a subset of the headers unless you click the rich >headers button. Yes, I know all of this. SO before posting I tested with the version of the message which is kept in _pop3proxy.log. This one is the raw one as received from the server. -- Recently using MacOSX....... From francois.granger at free.fr Sat Feb 1 15:16:06 2003 From: francois.granger at free.fr (Francois Granger) Date: Sat Feb 1 09:16:12 2003 Subject: [Spambayes] Alpha2 Pre-release In-Reply-To: <2W75GCLI31NON1WNKGFECA8051KHKF.3e3adf1d@myst> References: <2W75GCLI31NON1WNKGFECA8051KHKF.3e3adf1d@myst> Message-ID: At 14:39 -0600 31/01/2003, in message Re: [Spambayes] Alpha2 Pre-release, Tim Stone - Four Stones Expressions wrote: >1/30/2003 2:33:22 PM, Richie Hindle wrote: > >> >>I've built an alpha2 source release of Spambayes. Before we put it up on >>the main web site, I'd feel a lot better if someone could smoke-test it for >>me - I may have made some horrible mistake that I'm too close to see... >> >>I've put it here: >> >> http://entrian.com/spambayes/spambayes-1.0a2-pre.zip >> http://entrian.com/spambayes/spambayes-1.0a2-pre.tar.gz >> >>For POP3 proxy users, this release should be GUI out of the box - install >>it, run pop3proxy.py, point your browser at the URL, go to the Config page >>and enter your POP3 server details, change your email client to point at >>the proxy, and you're away - messages are classfied and you can train >>through the web. It woks as you describe it. I started a new database with it in a snap. > >For hammie users there's Neale's new muttrc and spambayes.el, and Skip's > >proxytee lets hammie users train through the web interface. Tim Stone's >>import/export script should make upgrading easy, for now and in the future. > >The operative word there is 'should'. Please back your database up before >migrating it, until we know for sure there aren't bugs in the script. - TimS I have a first diff for you... 93c93,100 < import storage --- > > try: > True, False > except NameError: > # Maintain compatibility with Python 2.2 > True, False = 1, 0 > > from spambayes import storage I am afraid it is not the only one. Currently running it as -e on my previous setup gives me a file with only the first line "nham, nspam". So it recognise the database, reads for it the values of these variables but don't read the records. I am trying to trace this. -- Recently using MacOSX....... From barry at python.org Tue Feb 4 18:15:07 2003 From: barry at python.org (Barry A. Warsaw) Date: Thu Feb 6 23:40:43 2003 Subject: [Spambayes] testing, please ignore #6 Message-ID: <15936.18811.231596.323037@gargle.gargle.HOWL> From francois.granger at free.fr Wed Feb 5 00:26:48 2003 From: francois.granger at free.fr (Francois Granger) Date: Thu Feb 6 23:40:44 2003 Subject: [Spambayes] Pop3proxy.py Message-ID: I copy&paste messages from Eudora to pop3proxy to check their classification. I am currently getting this message when I click on Classify: 500 Server error Traceback (most recent call last): File "/Volumes/OS99/spambayes/spambayes/Dibbler.py", line 398, in found_terminator getattr(plugin, name)(**params) TypeError: onClassify() takes exactly 4 non-keyword arguments (1 given) -- Recently using MacOSX....... From noreply at sourceforge.net Tue Feb 4 18:30:47 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Thu Feb 6 23:40:45 2003 Subject: [Spambayes] [ spambayes-Feature Requests-680629 ] Outlook plugin: Delete as spam marks as read Message-ID: Feature Requests item #680629, was opened at 2003-02-05 15:30 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498106&aid=680629&group_id=61702 Category: None Group: None Status: Open Priority: 5 Submitted By: Tony Meyer (anadelonbrin) Assigned to: Nobody/Anonymous (nobody) Summary: Outlook plugin: Delete as spam marks as read Initial Comment: Personally I think it would be nice if the "delete as spam" button marked the mail item as read. Note that I'm not saying that mail that is filtered as spam should be marked as read - it shouldn't (by default). If others agree, this would be a nice addition. Perhaps as an option in the prefs. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498106&aid=680629&group_id=61702 From noreply at sourceforge.net Tue Feb 4 18:31:11 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Thu Feb 6 23:40:45 2003 Subject: [Spambayes] [ spambayes-Feature Requests-680629 ] Outlook plugin: Delete as spam marks as read Message-ID: Feature Requests item #680629, was opened at 2003-02-05 15:30 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498106&aid=680629&group_id=61702 Category: None Group: None Status: Open >Priority: 1 Submitted By: Tony Meyer (anadelonbrin) >Assigned to: Mark Hammond (mhammond) Summary: Outlook plugin: Delete as spam marks as read Initial Comment: Personally I think it would be nice if the "delete as spam" button marked the mail item as read. Note that I'm not saying that mail that is filtered as spam should be marked as read - it shouldn't (by default). If others agree, this would be a nice addition. Perhaps as an option in the prefs. ---------------------------------------------------------------------- >Comment By: Tony Meyer (anadelonbrin) Date: 2003-02-05 15:31 Message: Logged In: YES user_id=552329 And who else to decide on this, but Mark :) ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498106&aid=680629&group_id=61702 From noreply at sourceforge.net Tue Feb 4 18:41:26 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Thu Feb 6 23:40:46 2003 Subject: [Spambayes] [ spambayes-Feature Requests-680629 ] Outlook plugin: Delete as spam marks as read Message-ID: Feature Requests item #680629, was opened at 2003-02-05 13:30 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498106&aid=680629&group_id=61702 Category: None Group: None Status: Open Priority: 1 Submitted By: Tony Meyer (anadelonbrin) Assigned to: Mark Hammond (mhammond) Summary: Outlook plugin: Delete as spam marks as read Initial Comment: Personally I think it would be nice if the "delete as spam" button marked the mail item as read. Note that I'm not saying that mail that is filtered as spam should be marked as read - it shouldn't (by default). If others agree, this would be a nice addition. Perhaps as an option in the prefs. ---------------------------------------------------------------------- >Comment By: Mark Hammond (mhammond) Date: 2003-02-05 13:41 Message: Logged In: YES user_id=14198 I'm not too sure this should happen unless the filter also marks the items as read - otherwise you still end up with many spam in the spam folder unread, and only the ones you move manually marked as read. I'm also kinda stuck about what to do with "options". Currently, options managed by the GUI are in a pickle, while other options are in the .ini file. I don't object to having new, outlook specific options in the INI file, but I do object to all our existing code breaking should we decide later to move this option into the GUI. ---------------------------------------------------------------------- Comment By: Tony Meyer (anadelonbrin) Date: 2003-02-05 13:31 Message: Logged In: YES user_id=552329 And who else to decide on this, but Mark :) ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498106&aid=680629&group_id=61702 From noreply at sourceforge.net Tue Feb 4 18:56:30 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Thu Feb 6 23:40:46 2003 Subject: [Spambayes] [ spambayes-Feature Requests-680629 ] Outlook plugin: Delete as spam marks as read Message-ID: Feature Requests item #680629, was opened at 2003-02-05 15:30 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498106&aid=680629&group_id=61702 Category: None Group: None Status: Open Priority: 1 Submitted By: Tony Meyer (anadelonbrin) Assigned to: Mark Hammond (mhammond) Summary: Outlook plugin: Delete as spam marks as read Initial Comment: Personally I think it would be nice if the "delete as spam" button marked the mail item as read. Note that I'm not saying that mail that is filtered as spam should be marked as read - it shouldn't (by default). If others agree, this would be a nice addition. Perhaps as an option in the prefs. ---------------------------------------------------------------------- >Comment By: Tony Meyer (anadelonbrin) Date: 2003-02-05 15:56 Message: Logged In: YES user_id=552329 My reasoning was that if the user manually selects to delete it as spam, then it is as good as read. Those that are moving via the filter have not been read. Personally I still wade through the filtered spam to check it for false positives, and mark the messages as read as I go (so that the 'unread' display is the number of messages I haven't checked). If I choose delete as spam, I then have to go to the spam folder and mark it as read. In any case, no big deal if you disagree, it was just a thought :) Re: the ini file: looking at the ini, it doesn't seem to have anything that couldn't be in the GUI. Most of it would probably fit in the "advanced" dialog. It would probably be good if the ini was only for 'beta' options - anything that is for public use should be in the GUI. And if a 'beta' option moves to 'public', then it doesn't matter (much) if it breaks, because those using beta options should be upgrading anyway. Moving the existing settings (most of which should be exposed I think) would mean breaking existing code, but maybe just this once? Maybe this discussion should move to the list? (maybe I should have posted this there originally?) ---------------------------------------------------------------------- Comment By: Mark Hammond (mhammond) Date: 2003-02-05 15:41 Message: Logged In: YES user_id=14198 I'm not too sure this should happen unless the filter also marks the items as read - otherwise you still end up with many spam in the spam folder unread, and only the ones you move manually marked as read. I'm also kinda stuck about what to do with "options". Currently, options managed by the GUI are in a pickle, while other options are in the .ini file. I don't object to having new, outlook specific options in the INI file, but I do object to all our existing code breaking should we decide later to move this option into the GUI. ---------------------------------------------------------------------- Comment By: Tony Meyer (anadelonbrin) Date: 2003-02-05 15:31 Message: Logged In: YES user_id=552329 And who else to decide on this, but Mark :) ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498106&aid=680629&group_id=61702 From noreply at sourceforge.net Tue Feb 4 19:04:45 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Thu Feb 6 23:40:47 2003 Subject: [Spambayes] [ spambayes-Feature Requests-616944 ] Mozilla Mail integration Message-ID: Feature Requests item #616944, was opened at 2002-10-01 21:31 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498106&aid=616944&group_id=61702 Category: None Group: None Status: Open Priority: 5 Submitted By: Sinchi Pacharuraq (sinchi) Assigned to: Nobody/Anonymous (nobody) Summary: Mozilla Mail integration Initial Comment: Integration with Mozilla Mail client ---------------------------------------------------------------------- Comment By: Tony Meyer (anadelonbrin) Date: 2003-02-05 16:04 Message: Logged In: YES user_id=552329 This is pretty old now, and it could probably be closed. You can add such a filter in Mozilla - like this: In the Mail window, select: Tools -> Message Filters -> New When you get to the new filter pane, name the filter X- Hammie-Disposition Filter Criteria -> Customize -> New Message Header: Add the following: In the box type X-Hammie-Disposition click Add, then OK. Select the following as the Filter Criteria: X-Hammie-Disposition contains Yes Under Perform this action, select a destination folder ---------------------------------------------------------------------- Comment By: Richie Hindle (richiehindle) Date: 2002-10-03 01:33 Message: Logged In: YES user_id=85414 I'm no expert on how Mozilla filters work... can you add a filter that says "If a message contains an X-Hammie-Disposition header whose value starts with Yes then "? If so, you can use either hammie.py (as part of your unix mail delivery system) or pop3proxy.py (on either a server machine or your own client machine). Both of these add an X-Hammie-Disposition header, with which you can filter your messages. ---------------------------------------------------------------------- Comment By: Sinchi Pacharuraq (sinchi) Date: 2002-10-02 21:53 Message: Logged In: YES user_id=621182 I just want to have this anti-spam filter built in Mozilla message filters. For example, user might activate this filter to delete spam messages from inbox or to move it to special folder. ---------------------------------------------------------------------- Comment By: Skip Montanaro (montanaro) Date: 2002-10-02 04:04 Message: Logged In: YES user_id=44345 ummm.... a bit short on detail/description. What precisely do you mean by "Mozilla Mail integration"? Can you describe what you would like to see feature-wise? Note that no other mail system integration has been attempted at this point with the exception that I believe the hammie script works with procmail. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498106&aid=616944&group_id=61702 From noreply at sourceforge.net Tue Feb 4 19:11:48 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Thu Feb 6 23:40:48 2003 Subject: [Spambayes] [ spambayes-Feature Requests-680629 ] Outlook plugin: Delete as spam marks as read Message-ID: Feature Requests item #680629, was opened at 2003-02-05 13:30 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498106&aid=680629&group_id=61702 Category: None Group: None Status: Open Priority: 1 Submitted By: Tony Meyer (anadelonbrin) Assigned to: Mark Hammond (mhammond) Summary: Outlook plugin: Delete as spam marks as read Initial Comment: Personally I think it would be nice if the "delete as spam" button marked the mail item as read. Note that I'm not saying that mail that is filtered as spam should be marked as read - it shouldn't (by default). If others agree, this would be a nice addition. Perhaps as an option in the prefs. ---------------------------------------------------------------------- >Comment By: Mark Hammond (mhammond) Date: 2003-02-05 14:11 Message: Logged In: YES user_id=14198 Yep, I see that makring as read could be useful in that they have been reviewed, but then I would expect Outlook's normal mechanism to still work and mark it read. I have my preview pane mark as read after 2 seconds :) Re the INI file - my problem is that the GUI needs to modify these options, and I don't see how it is trivial to keep the fairly "free-form" INI file format supported by configparser, while only writing out certain elements and not others and also keeping comments etc intact. I'll make a deal - help me with the options problem, and I will give you 5 free option . Let's take it to email... ---------------------------------------------------------------------- Comment By: Tony Meyer (anadelonbrin) Date: 2003-02-05 13:56 Message: Logged In: YES user_id=552329 My reasoning was that if the user manually selects to delete it as spam, then it is as good as read. Those that are moving via the filter have not been read. Personally I still wade through the filtered spam to check it for false positives, and mark the messages as read as I go (so that the 'unread' display is the number of messages I haven't checked). If I choose delete as spam, I then have to go to the spam folder and mark it as read. In any case, no big deal if you disagree, it was just a thought :) Re: the ini file: looking at the ini, it doesn't seem to have anything that couldn't be in the GUI. Most of it would probably fit in the "advanced" dialog. It would probably be good if the ini was only for 'beta' options - anything that is for public use should be in the GUI. And if a 'beta' option moves to 'public', then it doesn't matter (much) if it breaks, because those using beta options should be upgrading anyway. Moving the existing settings (most of which should be exposed I think) would mean breaking existing code, but maybe just this once? Maybe this discussion should move to the list? (maybe I should have posted this there originally?) ---------------------------------------------------------------------- Comment By: Mark Hammond (mhammond) Date: 2003-02-05 13:41 Message: Logged In: YES user_id=14198 I'm not too sure this should happen unless the filter also marks the items as read - otherwise you still end up with many spam in the spam folder unread, and only the ones you move manually marked as read. I'm also kinda stuck about what to do with "options". Currently, options managed by the GUI are in a pickle, while other options are in the .ini file. I don't object to having new, outlook specific options in the INI file, but I do object to all our existing code breaking should we decide later to move this option into the GUI. ---------------------------------------------------------------------- Comment By: Tony Meyer (anadelonbrin) Date: 2003-02-05 13:31 Message: Logged In: YES user_id=552329 And who else to decide on this, but Mark :) ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498106&aid=680629&group_id=61702 From noreply at sourceforge.net Tue Feb 4 19:17:50 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Thu Feb 6 23:40:49 2003 Subject: [Spambayes] [ spambayes-Feature Requests-680629 ] Outlook plugin: Delete as spam marks as read Message-ID: Feature Requests item #680629, was opened at 2003-02-05 15:30 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498106&aid=680629&group_id=61702 Category: None Group: None >Status: Closed Priority: 1 Submitted By: Tony Meyer (anadelonbrin) Assigned to: Mark Hammond (mhammond) Summary: Outlook plugin: Delete as spam marks as read Initial Comment: Personally I think it would be nice if the "delete as spam" button marked the mail item as read. Note that I'm not saying that mail that is filtered as spam should be marked as read - it shouldn't (by default). If others agree, this would be a nice addition. Perhaps as an option in the prefs. ---------------------------------------------------------------------- >Comment By: Tony Meyer (anadelonbrin) Date: 2003-02-05 16:17 Message: Logged In: YES user_id=552329 Agreed that it is not necessary. ---------------------------------------------------------------------- Comment By: Mark Hammond (mhammond) Date: 2003-02-05 16:11 Message: Logged In: YES user_id=14198 Yep, I see that makring as read could be useful in that they have been reviewed, but then I would expect Outlook's normal mechanism to still work and mark it read. I have my preview pane mark as read after 2 seconds :) Re the INI file - my problem is that the GUI needs to modify these options, and I don't see how it is trivial to keep the fairly "free-form" INI file format supported by configparser, while only writing out certain elements and not others and also keeping comments etc intact. I'll make a deal - help me with the options problem, and I will give you 5 free option . Let's take it to email... ---------------------------------------------------------------------- Comment By: Tony Meyer (anadelonbrin) Date: 2003-02-05 15:56 Message: Logged In: YES user_id=552329 My reasoning was that if the user manually selects to delete it as spam, then it is as good as read. Those that are moving via the filter have not been read. Personally I still wade through the filtered spam to check it for false positives, and mark the messages as read as I go (so that the 'unread' display is the number of messages I haven't checked). If I choose delete as spam, I then have to go to the spam folder and mark it as read. In any case, no big deal if you disagree, it was just a thought :) Re: the ini file: looking at the ini, it doesn't seem to have anything that couldn't be in the GUI. Most of it would probably fit in the "advanced" dialog. It would probably be good if the ini was only for 'beta' options - anything that is for public use should be in the GUI. And if a 'beta' option moves to 'public', then it doesn't matter (much) if it breaks, because those using beta options should be upgrading anyway. Moving the existing settings (most of which should be exposed I think) would mean breaking existing code, but maybe just this once? Maybe this discussion should move to the list? (maybe I should have posted this there originally?) ---------------------------------------------------------------------- Comment By: Mark Hammond (mhammond) Date: 2003-02-05 15:41 Message: Logged In: YES user_id=14198 I'm not too sure this should happen unless the filter also marks the items as read - otherwise you still end up with many spam in the spam folder unread, and only the ones you move manually marked as read. I'm also kinda stuck about what to do with "options". Currently, options managed by the GUI are in a pickle, while other options are in the .ini file. I don't object to having new, outlook specific options in the INI file, but I do object to all our existing code breaking should we decide later to move this option into the GUI. ---------------------------------------------------------------------- Comment By: Tony Meyer (anadelonbrin) Date: 2003-02-05 15:31 Message: Logged In: YES user_id=552329 And who else to decide on this, but Mark :) ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498106&aid=680629&group_id=61702 From hinsen at cnrs-orleans.fr Wed Feb 5 15:18:06 2003 From: hinsen at cnrs-orleans.fr (Konrad Hinsen) Date: Thu Feb 6 23:41:00 2003 Subject: [Spambayes] Spambayes packaging Message-ID: <200302051518.06445.hinsen@cnrs-orleans.fr> Dear Spambayes team, first of all, thanks for making spammers' lives more difficult :-) One suggestion: if you add a file "MANIFEST.in" containing the line "include *.py", then distutils can also produce RPM files. Very convenient. Konrad. -- ------------------------------------------------------------------------------- Konrad Hinsen | E-Mail: hinsen@cnrs-orleans.fr Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.56.24 Rue Charles Sadron | Fax: +33-2.38.63.15.17 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ France | Nederlands/Francais ------------------------------------------------------------------------------- From popiel at wolfskeep.com Thu Feb 6 06:34:06 2003 From: popiel at wolfskeep.com (T. Alexander Popiel) Date: Thu Feb 6 23:42:57 2003 Subject: [Spambayes] Msg class broken? Message-ID: <20030206143406.78EBE2DEB4@cashew.wolfskeep.com> I'm trying to do a bit more testing (*gasp*), but I'm having a bit of difficulty: it seems that the tokenizer doesn't like being given a simple string anymore, as is done in the Msg class in msgs.py. If I'm reading things right, this breaks all of the automated testing tools. Have a traceback: Traceback (most recent call last): File "testtools/Continuous.py", line 293, in ? main() File "testtools/Continuous.py", line 254, in main tests[j].predict([msg], isspam) File "testtools/Continuous.py", line 94, in predict prob = guess(example) File "/home/cashew/popiel/spambayes/testing/spambayes/spambayes/classifier.py", line 217, in chi2_spamprob clues = self._getclues(wordstream) File "/home/cashew/popiel/spambayes/testing/spambayes/spambayes/classifier.py", line 436, in _getclues for word in Set(wordstream): File "/home/cashew/popiel/spambayes/testing/spambayes/spambayes/compatsets.py", line 374, in __init__ self._update(iterable) File "/home/cashew/popiel/spambayes/testing/spambayes/spambayes/compatsets.py", line 333, in _update for element in it: File "/home/cashew/popiel/spambayes/testing/spambayes/spambayes/tokenizer.py", line 1052, in tokenize for tok in self.tokenize_headers(msg): File "/home/cashew/popiel/spambayes/testing/spambayes/spambayes/tokenizer.py", line 1063, in tokenize_headers for w in crack_content_xyz(x): File "/home/cashew/popiel/spambayes/testing/spambayes/spambayes/tokenizer.py", line 791, in crack_content_xyz yield 'content-type:' + msg.get_content_type() AttributeError: Message instance has no attribute 'get_content_type' Please ignore the top three lines of the trace; I'm building my own driver for testing with incremental training after each message. (What I'm trying to do in the big picture is get graphs of how the error rates drop off over time with various training modes.) Anyway, it looks like either msgs.py needs to be updated to pass in email.Message.Message objects, or tokenizer.py needs to relearn how to accept raw strings. Am I reading this right? This seems odd since tokenizer does seem to try to convert the string to a Message via the auspices of mboxutils... help? - Alex From acapnotic at users.sourceforge.net Thu Feb 6 08:18:45 2003 From: acapnotic at users.sourceforge.net (Kevin Turner) Date: Thu Feb 6 23:43:19 2003 Subject: [Spambayes] Ximian Evolution In-Reply-To: References: <1044035023.21121.93.camel@troglodyte.funhouse> Message-ID: <1044548323.1958.856.camel@troglodyte.funhouse> On Fri, 2003-01-31 at 10:18, Neale Pickett wrote: > Kevin Turner writes: > > Current versions of Evolution 1.2 can filter messages based on the > > exit code of a process you pipe them to. > > Okay, that would be easy enough to add as an option to hammiefilter. > What exit codes mean what? That's entirely configurable. Bad news, though: It seems as if you have to define two separate filters to test for two exit codes, i.e. a "spam" filter and "unsure" filter and let the default case be "ham" or something. That means at least two invocations per message -- this is getting very expensive. Evolution discards the output from the pipe, so you can't use it to add headers to the message to filter on. > I've only used Evolution once or twice, but it seems to me that their > whole gig is to get you to write plugins for everything. So a bonobo > (or whatever their object broker is called) component would be the Right > Thing. If we could make it as snazzy as the Outlook plugin, that'd be > even Righter. Unfortunately, there doesn't seem to be anything that comes with my Evolution installation or anything within easy reach of google that explains what sort of things you can do through their component interface. Some heavy source-diving will be required. -- The moon is waxing crescent, 22.2% illuminated, 4.6 days old. From skip at pobox.com Thu Feb 6 11:45:24 2003 From: skip at pobox.com (Skip Montanaro) Date: Thu Feb 6 23:43:28 2003 Subject: [Spambayes] Bad move, Apple Message-ID: <15938.40756.578327.757028@montanaro.dyndns.org> Got this in today's Apple eNews mailing: 5. Delivering a One-Two Punch to Spammers ......................................... Yes, Mac OS X Mail can help you deliver a staggering blow to spammers. Simply pull down the Mail menu, choose Junk Mail, and select Automatic. The next time you receive email, Mail will move suspect email into a Junk folder. Now you're ready to deliver a real knockout punch to spammers by taking advantage of yet another potent spam-fighting weapon: 1. Click on the Junk folder. 2. Type Command-a to select all of the email in the Junk folder. 3. Choose "Bounce to Sender" from the Message menu. Mail will return the selected messages to the senders marked "User unknown," making them think your email address invalid, encouraging them to drop you from their lists, and, thus, eliminating spam at its source. http://www.apple.com/macosx/jaguar/mail.html *sigh* Skip From francois.granger at free.fr Thu Feb 6 22:16:18 2003 From: francois.granger at free.fr (Francois Granger) Date: Thu Feb 6 23:43:54 2003 Subject: [Spambayes] Mailing list Message-ID: With all the awareness that the LinuxJournal article will bring, it would be safe to create a "spambayes users" mailing list and advertize for it on the main page of the http://spambayes.sourceforge.net site. Don't you think ??? If you go the Yahoo mailing list way, I can help administering the list..... -- Recently using MacOSX....... From anthony at interlink.com.au Fri Feb 7 16:27:59 2003 From: anthony at interlink.com.au (Anthony Baxter) Date: Fri Feb 7 00:28:53 2003 Subject: [Spambayes] Spambayes packaging In-Reply-To: <200302051518.06445.hinsen@cnrs-orleans.fr> Message-ID: <200302070527.h175Rx214291@localhost.localdomain> >>> Konrad Hinsen wrote > Dear Spambayes team, > > first of all, thanks for making spammers' lives more difficult :-) > > One suggestion: if you add a file "MANIFEST.in" containing the line "include > *.py", then distutils can also produce RPM files. Very convenient. It's already there, in CVS... Anthony From tim at fourstonesExpressions.com Thu Feb 6 23:42:10 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Fri Feb 7 00:42:17 2003 Subject: [Spambayes] Re: Centralization (was: pedantism) Message-ID: <31EC513XRQKJGC72WUTOJBANI76WQDB.3e434732@myst> Neale, here is the text of some messages that deal with various aspects of centralization of classification and message handling in spambayes. It's a bit of reading, but the gist is that training, classification, and message handling should be done in one place regardless of what front-end is being integrated, and that the current Corpus and storage modules do not meet the bill, particularly for outlook. This is because the paradigm that's being used in corpus is has a fairly serious impedance mismatch with outlook's message storage mechanism. When I did the corpus stuff, it was particularly to support the pop3proxy. I think now is the time to really rethink the abstractions involved, and make it work correctly for all clients we now have, and position it for whatever will come along... I'm already thinking about Lotus Notes, for instance. Here ya go, tell me what you think... - TimS Neale Pickett wrote: Mark Hammond wrote: I tend to filter the Python zen thusly: % python -c "import this" | grep purity Although practicality beats purity. However, I have tried to think a little about what a generic system would look like. For example, I tried to create a generic "message" object family: class MsgStore: def Close(self): def GetFolderGenerator(self, folder_ids, include_sub): def GetFolder(self, folder_id): def GetMessage(self, message_id): class MsgStoreFolder: def GetMessageGenerator(self, folder): class MsgStoreMsg: def GetEmailPackageObject(self, strip_mime_headers=True): # Return a "read-only" Python email package object # "read-only" in that changes will never be reflected to the real store. raise NotImplementedError def SetField(self, name, value): # Abstractly set a user field name/id to a field value. # User field is for the user to see - status/internal fields # should get their own methods raise NotImplementedError def GetField(self, name): # Abstractly get a user field name/id to a field value. raise NotImplementedError def Save(self): # Save changes after field changes. raise NotImplementedError def MoveTo(self, folder_id): # Move the message to a folder. raise NotImplementedError def CopyTo(self, folder_id): # Copy the message to a folder. raise NotImplementedError The essence of our training code is then: def train_folder( f, isspam, mgr, progress): # fancy progress reporting code omitted for message in f.GetMessageGenerator(): train_message(message, isspam, mgr) def train_message(msg, is_spam, mgr): # Train an individual message. # Returns True if newly added (message will be correctly # untrained if it was in the wrong category), False if already # in the correct category. Catch your own damn exceptions. from tokenizer import tokenize stream = msg.GetEmailPackageObject() tokens = tokenize(stream) # Handle we may have already been trained. was_spam = mgr.message_db.get(msg.searchkey) if was_spam is None: # never previously trained. pass elif was_spam == is_spam: # Already in DB - do nothing (full retrain will wipe msg db) # leave now. return False else: mgr.bayes.unlearn(tokens, was_spam, False) # OK - setup the new data. mgr.bayes.learn(tokens, is_spam, False) mgr.message_db[msg.searchkey] = is_spam mgr.bayes_dirty = True return True As Tim says, not much Outlook specific here (some - eg, "msg.searchkey" - but nothing too painful) Mark. Mark Hammond wrote: I think that the classes I posted a while ago suffer from the exact reverse problem as your idea. My idea was to make a "message store" that is largely independent of training. I believe the problem with your design is that it deals with the training at the expense of the message store. Obviously, but worth mentioning, is that there are competing interests here. My focus is towards clients, and specifically the outlook one (if there were more clients I would be happy to think of them too ). Alot of the focus of this group is towards admins rather than individuals (which is just fine!) But it seems the current thinking is of a corpus as being a fairly static, well-controlled set of messages used almost purely for training purposes. For client programs, this may not be practical. The corpus is a more dynamic set of messages - and worse, actually *is* the user's set of messages rather than a collection of message copies. For example, "moving" a message in a corpus may actually mean moving the message in the user's real inbox. This may or may not be what is intended - a corpus "move" operation is more about changing a message's classification than it is about physically moving pieces of mail around. > A Corpus wouldn't know how to create Message objects, nor would a Message > object know how to create itself - classes *derived from* them would know > how to do that. For instance (totally untested code, probably full of > typos) - > > class Message: Jeremy and I both posted real code, so starting with something that takes that into consideration would be good. > I may be putting too much > into the base class by demanding that the text of the message be given to > the constructor - that precludes making FileMessage lazy, and > only read the > file when it needs to.] It also defeats the abstract nature of the class. > 'Corpus' works the same way; again, the details may be naive, but this is > the general idea: I'm hoping I don't sound grumpy, but again, the few systems that already exist for this engine are the best ones to use to discover the naivety early > You can then envisage a MailboxCorpus, and OutlookFolderCorpus, an > IMAPFolderCorpus, a POP3AccountCorpus, a PigeonMessagingCorpus and so on. I can't quite imagine that at the moment, as per my comments at the top. Off the top of my head, I believe we need: * An abstract "message id" * A message classification database, as discussed before - basically just a dictionary, keyed by ID, holding either "spam" or "ham". * A "corpus" becomes just an enumerator of message IDs for bulk/batch training. It has no move etc operations. * A "message store" is capable of returning a message object given its ID. * The training API simply takes message objects and updates the probability and message databases. At that level, we really don't need much else - no folders or any other grouping of messages. I'm really not too sure there is much value in adding higher-level concepts such as folders or message store "move" operations - certainly not at the outset, where there are too many competing requirements. > Yes - this could work using observer objects registered with Corpus > objects: This could work, but may be too simple to be necessary. If the process of re-training a message in the Outlook GUI becomes: def RetrainMessageAsSpam(): # Outlook specific code to get an ID. message = message_store.GetMessage(id) if not classifier.IsSpam(message): classifier.train(message, is_spam=True) And not a whole lot else, it doesn't seem worth it. Unfortunately, the decision to perform the retrain is the complex, but client specific part. Is this a newly delivered message? Did the user manually move the message somewhere? Did the user click one of our buttons? Is the user deleting old ham that we want to train on before it dies forever? Outlook does this via examining what Outlook event we are seeing, and looking at meta-data we possibly previously attached to the message. I'm not sure this can be encapsulated well at the moment without adding all our meta-data etc baggage to the base classes. > Most of the *new* code that's needed is defining the abstract concepts and > their interfaces, rather than writing code that actually *does* anything - > it's building a framework. *cough* ummm... This is doomed to failure. Code *must* do something to be taken seriously. At the very least, I would expect to see the existing test driver framework running against these "abstract concepts" > Once the framework is there, most of the code needed to implement the > functionality should already be in the project - code to hook > into Outlook, > to train on a message, to parse mbox files, and so on. It just needs > hooking into the framework. See above . Mark. Tim Stone wrote: >Tim Stone - Four Stones Expressions writes: > >> I think that while you're at it, we should refactor the Corpus stuff, >> so that messages and databases and training and classifying are all >> handled in exactly one place in the system. Richie has this idea of a >> 'spambayes server' which is the heart and soul of the systems, and >> that all the user facing stuff fronts.... what say you? - TimS > >I do think the hammie stuff could stand to be a little more tightly >integrated with the rest of the methods--or at least with the >pop3proxy. I was trying to do this with the Hammie class, and I think >maybe my edits to mboxtrain achieve this pretty well. But you guys are >doing things I never dreamed of (like training via a web interface) and >I haven't even begun to look at integrating that stuff. > >Hammiefilter is the simple case. mboxtrain/hammiebulk are the difficult >ones, as is proxytee. So I'd be all for centralizing mailbox access and >message stores. What do you propose? > > c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org From mhammond at skippinet.com.au Fri Feb 7 16:46:30 2003 From: mhammond at skippinet.com.au (Mark Hammond) Date: Fri Feb 7 00:47:13 2003 Subject: [Spambayes] Outlook addin using bsddb Message-ID: <01ac01c2ce6c$449265b0$530f8490@eden> [Resending - appears to have hit the bit-bucket!] I have just checked in a change to the Outlook plugin that will use a bsddb style database if a reasonable implementation can be found. Currently, a "reasonable" implementation means: * A bsddb3 module can be imported. * If Python is 2.3 or greater, bsddb can be imported (as this is the new version) So everyone using Python 2.2, and without bsddb3, should continue to use pickles and see no change. Anyone lucky enough to have a working bsddb will see a *huge* performance win when starting and stopping outlook. NOTE: I have only tested with python 2.3 + bsddb. I have not tried bsddb3. If you system picks up the bsddb implementation, then you will need to perform a complete retrain. There is no "migration" code in place. Regarding performance: A "full re-train" is about 10% slower using a DB. After a full retrain, the entire database needs to be flushed, so saving the initial bsddb file takes twice as long as saving the pickle. Subsequent startup times are then radically reduced. On my machine with ~6000 messages, the pickles load in approx 2000ms, whereas the bsddb files load in approx 2ms. A huge win by any means. Subsequent shutdown times will depend on how much training has been done (ie, how many words need to be flushed to the DB), but is generally many many times faster. Please let me know if there are any problems, or any suggestions. Zoom-zoom-zoom ly, Mark. From francois.granger at free.fr Fri Feb 7 09:16:57 2003 From: francois.granger at free.fr (Francois Granger) Date: Fri Feb 7 03:17:01 2003 Subject: [Spambayes] Bad move, Apple In-Reply-To: <15938.40756.578327.757028@montanaro.dyndns.org> References: <15938.40756.578327.757028@montanaro.dyndns.org> Message-ID: At 11:45 -0600 06/02/2003, in message [Spambayes] Bad move, Apple, Skip Montanaro wrote: >Got this in today's Apple eNews mailing: > > 5. Delivering a One-Two Punch to Spammers >[...] >*sigh* Somebody _has_ to explain to them..... ;-) -- Recently using MacOSX....... From Paul.Moore at atosorigin.com Fri Feb 7 09:26:00 2003 From: Paul.Moore at atosorigin.com (Moore, Paul) Date: Fri Feb 7 04:26:38 2003 Subject: [Spambayes] Outlook addin using bsddb Message-ID: <16E1010E4581B049ABC51D4975CEDB880113D8E9@UKDCX001.uk.int.atosorigin.com> From: Mark Hammond [mailto:mhammond@skippinet.com.au] > I have just checked in a change to the Outlook plugin that will use > a bsddb style database if a reasonable implementation can be found. Whoo hoo! Thanks for this, Mark. The Outlook shutdown times were getting annoying :-) Off to download now - I'll let you know if I hit any problems. Paul. From mhammond at skippinet.com.au Fri Feb 7 21:51:54 2003 From: mhammond at skippinet.com.au (Mark Hammond) Date: Fri Feb 7 05:52:43 2003 Subject: [Spambayes] Mailing list In-Reply-To: Message-ID: <002401c2ce96$ee572520$530f8490@eden> > With all the awareness that the LinuxJournal article will bring, it > would be safe to create a "spambayes users" mailing list and > advertize for it on the main page of the > http://spambayes.sourceforge.net site. > > Don't you think ??? Actually, yet another thing the timbot taught me was not to create a mailing list until there is actual demand, rather than an expected demand. I earlier suggested an Outlook specific list, and its need hasn't been demonstrated yet. The research in this project is slowing down (probably only training left in terms of this specific project), so I guess the entire list will slowly move towards an "end user" view of the technology anyway. acronymically, IMO, YAGNI ;) Mark. From m2 at plusseven.com Fri Feb 7 11:55:44 2003 From: m2 at plusseven.com (Alex Polite) Date: Fri Feb 7 05:56:21 2003 Subject: [Spambayes] Outlook addin using bsddb In-Reply-To: <01ac01c2ce6c$449265b0$530f8490@eden> References: <01ac01c2ce6c$449265b0$530f8490@eden> Message-ID: <20030207105544.GI633@matijek> On Fri, Feb 07, 2003 at 04:46:30PM +1100, Mark Hammond wrote: > [Resending - appears to have hit the bit-bucket!] > > I have just checked in a change to the Outlook plugin that will use a bsddb > style database if a reasonable implementation can be found. Currently, a > "reasonable" implementation means: > * A bsddb3 module can be imported. Maybe you want need bsddb3. My tests indicate that dumbdbm is faster then bsddb3. And dumbdbm is all python and included in the standard library. I was quite surprised by this and put a mail out here to see if anyone could corroborate it but it didn't seem to spark any interest. Here's what I wrote. alex> I moved from spamcan to spambayes today and wasted a couple alex> hours profiling hammie.py alex> profile.run("spambayes.hammiebulk.main()", alex> '/tmp/stats') alex> I ran this on approximately 2000 messages and aggregated the alex> stats. The entire run was 496 CPU seconds. alex> When looking at the profiling information I realized that I alex> was using dumbdbm, which is supposed to very slow. I alex> installed bsddb3, rebuilt my db and rerun the profiling alex> tests. alex> The entire run was now 520 CPU seconds, a 4.8% increase. alex> So it seems like "stupid beats smart" goes for speed alex> optimizations to. alex> Can anyone corroborate this? alex -- Alex Polite http://plusseven.com/gpg From mhammond at skippinet.com.au Fri Feb 7 22:10:06 2003 From: mhammond at skippinet.com.au (Mark Hammond) Date: Fri Feb 7 06:11:03 2003 Subject: [Spambayes] Outlook addin using bsddb In-Reply-To: <20030207105544.GI633@matijek> Message-ID: <002501c2ce99$78c46450$530f8490@eden> [Alex] > On Fri, Feb 07, 2003 at 04:46:30PM +1100, Mark Hammond wrote: > > [Resending - appears to have hit the bit-bucket!] > > > > I have just checked in a change to the Outlook plugin that > will use a bsddb > > style database if a reasonable implementation can be found. > Currently, a > > "reasonable" implementation means: > > * A bsddb3 module can be imported. > > Maybe you want need bsddb3. Yes, you need either bsddb3, *or* Python 2.3 and a "bsddb" module. If the only thing available if dumbdbm, or anyotherdbm, then *Outlook* sticks to pickles. This functionality now exists in 2 places, and it is a bit of a mess out of our control. 1) spambayes/dbmstorage.py knows to avoid "bsddb" on Windows pre Python 2.3 at all costs, due to bugs. Apart from this, it does a fairly standard "fastest first" dbm search. Thus, dbmstorage.py on Windows Python 2.2 and earlier will generally select dumbdbm. If you manually installed bsddb3, or run a 2.3 alpha, then you will get a bsd database from dbmstorage. 2) the Outlook plugin steps in before this using a pickle, avoiding dbmstorage completely if no good bsddb is available. ie, Outlook will *never* use a dumbdbm. > alex> Can anyone corroborate this? Yep :) Mark. From Paul.Moore at atosorigin.com Fri Feb 7 11:13:43 2003 From: Paul.Moore at atosorigin.com (Moore, Paul) Date: Fri Feb 7 06:14:16 2003 Subject: [Spambayes] Outlook addin using bsddb Message-ID: <16E1010E4581B049ABC51D4975CEDB880113D8EB@UKDCX001.uk.int.atosorigin.com> From: Mark Hammond [mailto:mhammond@skippinet.com.au] > Please let me know if there are any problems, or any suggestions. Something seems odd. I deregistered and deleted my old setup totally, then installed the new version. (Windows 2000, Python 2.2.2, Outlook 2000 with Exchange server). When I ran manager to do a retrain, I got >manager Created new configuration file 'C:\Applications\Spambayes\Outlook2000\default_configuration.pck' Loaded bayes database from 'C:\Applications\Spambayes\Outlook2000\default_bayes_database.db' Failed to load bayes message database Traceback (most recent call last): File "C:\Applications\Spambayes\Outlook2000\manager.py", line 260, in LoadBayes message_db = self.db_manager.open_mdb() File "C:\Applications\Spambayes\Outlook2000\manager.py", line 123, in open_mdb return bsddb.hashopen(self.mdb_filename) error: (2, 'No such file or directory') Either bayes database or message database is missing - creating new Traceback (most recent call last): File "C:\Applications\Spambayes\Outlook2000\manager.py", line 462, in ? sys.exit(main(verbose)) File "C:\Applications\Spambayes\Outlook2000\manager.py", line 437, in main mgr = GetManager(verbose=verbose_level) File "C:\Applications\Spambayes\Outlook2000\manager.py", line 426, in GetManager _mgr = BayesManager(outlook=outlook, verbose=verbose) File "C:\Applications\Spambayes\Outlook2000\manager.py", line 162, in __init__ self.LoadBayes() File "C:\Applications\Spambayes\Outlook2000\manager.py", line 272, in LoadBayes self.InitNewBayes() File "C:\Applications\Spambayes\Outlook2000\manager.py", line 322, in InitNewBayes self.message_db = self.db_manager.new_mdb() File "C:\Applications\Spambayes\Outlook2000\manager.py", line 126, in new_mdb os.unlink(self.mdb_filename) OSError: [Errno 2] No such file or directory: 'C:\\Applications\\Spambayes\\Outlook2000\\default_message_database.db' The problem seems to be twofold - On the first hand, I'm getting OSError rather than IOError from the delete. I fixed this by just suppressing any exception from the delete (heavy handed, but what the heck?). Secondly, new_mdb() tests for the bsddb module by doing "import bsddb" and catching an import error, rather than by looking for the "db" symbol. This gives the wrong module on Python 2.2. I fixed this by just removing the import - we imported the right module at the top of manager.py, so that shouldn't be a problem anyway... A patch is below... Paul. --- manager.py.orig Wed Feb 05 03:09:42 2003 +++ manager.py Fri Feb 07 10:53:04 2003 @@ -42,7 +42,7 @@ except ImportError: # See if the explicit bsddb3 module exists. try: - import bsddb3 + import bsddb3 as bsddb use_db = True except ImportError: use_db = False @@ -116,16 +116,21 @@ bayes.db.close() bayes.dbm.close() def open_mdb(self): - try: - import bsddb - except ImportError: - import bsddb3 as bsddb + # Why are we doing the import again here? We did it at the top + # (and what's more we did it correctly there, checking that bsddb + # exports "db" so we don't get the broken Python 2.2 version... +# try: +# import bsddb +# except ImportError: +# import bsddb3 as bsddb return bsddb.hashopen(self.mdb_filename) def new_mdb(self): try: os.unlink(self.mdb_filename) - except IOError, e: - if e.errno != errno.ENOENT: raise + except: # I get OSError - don't try to be too specific... + pass +# except IOError, e: +# if e.errno != errno.ENOENT: raise return self.open_mdb() def store_mdb(self, mdb): mdb.sync() From tim at fourstonesExpressions.com Fri Feb 7 07:20:24 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Fri Feb 7 08:20:31 2003 Subject: [Spambayes] Outlook addin using bsddb In-Reply-To: <01ac01c2ce6c$449265b0$530f8490@eden> Message-ID: 2/6/2003 11:46:30 PM, "Mark Hammond" wrote: >[Resending - appears to have hit the bit-bucket!] > >I have just checked in a change to the Outlook plugin that will use a bsddb >style database if a reasonable implementation can be found. Currently, a >"reasonable" implementation means: >* A bsddb3 module can be imported. >* If Python is 2.3 or greater, bsddb can be imported (as this is the new >version) > >So everyone using Python 2.2, and without bsddb3, should continue to use >pickles and see no change. Anyone lucky enough to have a working bsddb will >see a *huge* performance win when starting and stopping outlook. NOTE: I >have only tested with python 2.3 + bsddb. I have not tried bsddb3. > >If you system picks up the bsddb implementation, then you will need to >perform a complete retrain. There is no "migration" code in place. You might well be able to use dbExpImp.py to export your database before your upgrade, then import it afterwards. All will be well then, and it'll save you a retrain. - TimS > >Regarding performance: A "full re-train" is about 10% slower using a DB. >After a full retrain, the entire database needs to be flushed, so saving the >initial bsddb file takes twice as long as saving the pickle. > >Subsequent startup times are then radically reduced. On my machine with >~6000 messages, the pickles load in approx 2000ms, whereas the bsddb files >load in approx 2ms. A huge win by any means. Subsequent shutdown times >will depend on how much training has been done (ie, how many words need to >be flushed to the DB), but is generally many many times faster. > >Please let me know if there are any problems, or any suggestions. > >Zoom-zoom-zoom ly, > >Mark. > > >_______________________________________________ >Spambayes mailing list >Spambayes@python.org >http://mail.python.org/mailman/listinfo/spambayes > > c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org From mhammond at skippinet.com.au Sat Feb 8 00:30:37 2003 From: mhammond at skippinet.com.au (Mark Hammond) Date: Fri Feb 7 08:31:37 2003 Subject: [Spambayes] Outlook addin using bsddb In-Reply-To: Message-ID: <002301c2cead$1a6843e0$530f8490@eden> > You might well be able to use dbExpImp.py to export your > database before your > upgrade, then import it afterwards. All will be well then, > and it'll save you > a retrain. - TimS Unlikely, as Outlook also uses a "message database", to remember how messages have previously been trained. I'd still love to see this in the core BTW, but I've been waiting to see how other applications handle this. How does pop3proxy/others deal with knowing if it needs to untrain mis-classified messages? I assume that in other applications, there no way the user can move a message "behind the application's back" like it can in Outlook - ie, once the application has a message in its "Spam" folder, only the application can remove it. In Outlook, we are dealing directly with the user's native folders, so *anything* can happen! FWIW, this is the crux of my problem making the "Corpus" classes fit Outlook - we aren't in total control of the folder! A patch to dbExpImp would probably be trivial, but I don't care enough any more . I expect a binary release will ship with a good bsddb module, and 2.3 will be out soon, so the issue shouldn't last too long. Mark. From tim at fourstonesExpressions.com Fri Feb 7 08:02:33 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Fri Feb 7 09:02:41 2003 Subject: [Spambayes] Outlook addin using bsddb In-Reply-To: <002301c2cead$1a6843e0$530f8490@eden> Message-ID: <5ZTNZXZWVUIFTRZX1WPNCHA8PNPKH.3e43bc79@myst> 2/7/2003 7:30:37 AM, "Mark Hammond" wrote: >> You might well be able to use dbExpImp.py to export your >> database before your >> upgrade, then import it afterwards. All will be well then, >> and it'll save you >> a retrain. - TimS > >Unlikely, as Outlook also uses a "message database", to remember how >messages have previously been trained. The database export and import is a simple dump to a delimited flat file, so training is not 'forgotten' by the operation. > I'd still love to see this in the >core BTW, but I've been waiting to see how other applications handle this. >How does pop3proxy/others deal with knowing if it needs to untrain >mis-classified messages? The core currently works over file system based message sets. Messages are cached into separate directories as then are classified, according to their spam/unsure/ham classification. These directories are interfaced using the Corpus module. Corpus is observable, with two observable events: onMessageAdd and onMessageRemove. When a message is added to the Corpus, it calls its observer, which would generally be a spam or ham Trainer, depending on what kind of Corpus it is. In fact, a Corpus *is* a spam corpus *because* it has a spam Trainer observer. Then to untrain/train, all a 'core client' needs to do is move the message from one Corpus to another. The Trainer observers are the only place that anyone cares what kind of mail is in that Corpus. > >I assume that in other applications, there no way the user can move a >message "behind the application's back" like it can in Outlook - ie, once >the application has a message in its "Spam" folder, only the application can >remove it. In Outlook, we are dealing directly with the user's native >folders, so *anything* can happen! Yeah, I recognize this problem now, a few months after the fact :(. I've raised this whole Corpus issue again on the 'Centralization' thread. It's time we (I?) get this right for pop3proxy, hammie*, outlook, and whatever else is lurking out there for the foreseeable future. > FWIW, this is the crux of my problem >making the "Corpus" classes fit Outlook - we aren't in total control of the >folder! > >A patch to dbExpImp would probably be trivial, but I don't care enough any >more . I expect a binary release will ship with a good bsddb module, Let's hope so. This dbm impl hell has lasted long enough! >and 2.3 will be out soon, so the issue shouldn't last too long. DbExpImp will be useful in some instances ;) > >Mark. > > > c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org From noreply at sourceforge.net Thu Feb 6 21:29:34 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Fri Feb 7 10:12:58 2003 Subject: [Spambayes] [ spambayes-Feature Requests-616944 ] Mozilla Mail integration Message-ID: Feature Requests item #616944, was opened at 2002-10-01 04:31 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498106&aid=616944&group_id=61702 Category: None Group: None Status: Open Priority: 5 Submitted By: Sinchi Pacharuraq (sinchi) Assigned to: Nobody/Anonymous (nobody) Summary: Mozilla Mail integration Initial Comment: Integration with Mozilla Mail client ---------------------------------------------------------------------- >Comment By: Tim Stone (timstone4) Date: 2003-02-06 23:29 Message: Logged In: YES user_id=645698 To the best of my knowledge, a bayesian filter based on spambayes is currently being integrated into the mozilla mailer. It is in beta right now, and I think is scheduled to be included in the next release. ---------------------------------------------------------------------- Comment By: Tony Meyer (anadelonbrin) Date: 2003-02-04 21:04 Message: Logged In: YES user_id=552329 This is pretty old now, and it could probably be closed. You can add such a filter in Mozilla - like this: In the Mail window, select: Tools -> Message Filters -> New When you get to the new filter pane, name the filter X- Hammie-Disposition Filter Criteria -> Customize -> New Message Header: Add the following: In the box type X-Hammie-Disposition click Add, then OK. Select the following as the Filter Criteria: X-Hammie-Disposition contains Yes Under Perform this action, select a destination folder ---------------------------------------------------------------------- Comment By: Richie Hindle (richiehindle) Date: 2002-10-02 08:33 Message: Logged In: YES user_id=85414 I'm no expert on how Mozilla filters work... can you add a filter that says "If a message contains an X-Hammie-Disposition header whose value starts with Yes then "? If so, you can use either hammie.py (as part of your unix mail delivery system) or pop3proxy.py (on either a server machine or your own client machine). Both of these add an X-Hammie-Disposition header, with which you can filter your messages. ---------------------------------------------------------------------- Comment By: Sinchi Pacharuraq (sinchi) Date: 2002-10-02 04:53 Message: Logged In: YES user_id=621182 I just want to have this anti-spam filter built in Mozilla message filters. For example, user might activate this filter to delete spam messages from inbox or to move it to special folder. ---------------------------------------------------------------------- Comment By: Skip Montanaro (montanaro) Date: 2002-10-01 11:04 Message: Logged In: YES user_id=44345 ummm.... a bit short on detail/description. What precisely do you mean by "Mozilla Mail integration"? Can you describe what you would like to see feature-wise? Note that no other mail system integration has been attempted at this point with the exception that I believe the hammie script works with procmail. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498106&aid=616944&group_id=61702 From noreply at sourceforge.net Fri Feb 7 01:09:35 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Fri Feb 7 10:12:59 2003 Subject: [Spambayes] [ spambayes-Feature Requests-680629 ] Outlook plugin: Delete as spam marks as read Message-ID: Feature Requests item #680629, was opened at 2003-02-05 02:30 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498106&aid=680629&group_id=61702 Category: None Group: None Status: Closed Priority: 1 Submitted By: Tony Meyer (anadelonbrin) Assigned to: Mark Hammond (mhammond) Summary: Outlook plugin: Delete as spam marks as read Initial Comment: Personally I think it would be nice if the "delete as spam" button marked the mail item as read. Note that I'm not saying that mail that is filtered as spam should be marked as read - it shouldn't (by default). If others agree, this would be a nice addition. Perhaps as an option in the prefs. ---------------------------------------------------------------------- Comment By: Paul Moore (pmoore) Date: 2003-02-07 09:09 Message: Logged In: YES user_id=113328 I'd like the "Mark as read" option. Most unsures and false negatives which are spam, I can identify by subject, and hence I don't open (and I don't use the preview pane). But it's not crucial - Ctrl-Q does a very quick "Mark as read" anyway... ---------------------------------------------------------------------- Comment By: Tony Meyer (anadelonbrin) Date: 2003-02-05 03:17 Message: Logged In: YES user_id=552329 Agreed that it is not necessary. ---------------------------------------------------------------------- Comment By: Mark Hammond (mhammond) Date: 2003-02-05 03:11 Message: Logged In: YES user_id=14198 Yep, I see that makring as read could be useful in that they have been reviewed, but then I would expect Outlook's normal mechanism to still work and mark it read. I have my preview pane mark as read after 2 seconds :) Re the INI file - my problem is that the GUI needs to modify these options, and I don't see how it is trivial to keep the fairly "free-form" INI file format supported by configparser, while only writing out certain elements and not others and also keeping comments etc intact. I'll make a deal - help me with the options problem, and I will give you 5 free option . Let's take it to email... ---------------------------------------------------------------------- Comment By: Tony Meyer (anadelonbrin) Date: 2003-02-05 02:56 Message: Logged In: YES user_id=552329 My reasoning was that if the user manually selects to delete it as spam, then it is as good as read. Those that are moving via the filter have not been read. Personally I still wade through the filtered spam to check it for false positives, and mark the messages as read as I go (so that the 'unread' display is the number of messages I haven't checked). If I choose delete as spam, I then have to go to the spam folder and mark it as read. In any case, no big deal if you disagree, it was just a thought :) Re: the ini file: looking at the ini, it doesn't seem to have anything that couldn't be in the GUI. Most of it would probably fit in the "advanced" dialog. It would probably be good if the ini was only for 'beta' options - anything that is for public use should be in the GUI. And if a 'beta' option moves to 'public', then it doesn't matter (much) if it breaks, because those using beta options should be upgrading anyway. Moving the existing settings (most of which should be exposed I think) would mean breaking existing code, but maybe just this once? Maybe this discussion should move to the list? (maybe I should have posted this there originally?) ---------------------------------------------------------------------- Comment By: Mark Hammond (mhammond) Date: 2003-02-05 02:41 Message: Logged In: YES user_id=14198 I'm not too sure this should happen unless the filter also marks the items as read - otherwise you still end up with many spam in the spam folder unread, and only the ones you move manually marked as read. I'm also kinda stuck about what to do with "options". Currently, options managed by the GUI are in a pickle, while other options are in the .ini file. I don't object to having new, outlook specific options in the INI file, but I do object to all our existing code breaking should we decide later to move this option into the GUI. ---------------------------------------------------------------------- Comment By: Tony Meyer (anadelonbrin) Date: 2003-02-05 02:31 Message: Logged In: YES user_id=552329 And who else to decide on this, but Mark :) ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498106&aid=680629&group_id=61702 From noreply at sourceforge.net Fri Feb 7 03:05:17 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Fri Feb 7 10:13:00 2003 Subject: [Spambayes] [ spambayes-Feature Requests-680629 ] Outlook plugin: Delete as spam marks as read Message-ID: Feature Requests item #680629, was opened at 2003-02-05 13:30 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498106&aid=680629&group_id=61702 Category: None Group: None >Status: Open Priority: 1 Submitted By: Tony Meyer (anadelonbrin) Assigned to: Mark Hammond (mhammond) Summary: Outlook plugin: Delete as spam marks as read Initial Comment: Personally I think it would be nice if the "delete as spam" button marked the mail item as read. Note that I'm not saying that mail that is filtered as spam should be marked as read - it shouldn't (by default). If others agree, this would be a nice addition. Perhaps as an option in the prefs. ---------------------------------------------------------------------- >Comment By: Mark Hammond (mhammond) Date: 2003-02-07 22:05 Message: Logged In: YES user_id=14198 Fair enough :) ---------------------------------------------------------------------- Comment By: Paul Moore (pmoore) Date: 2003-02-07 20:09 Message: Logged In: YES user_id=113328 I'd like the "Mark as read" option. Most unsures and false negatives which are spam, I can identify by subject, and hence I don't open (and I don't use the preview pane). But it's not crucial - Ctrl-Q does a very quick "Mark as read" anyway... ---------------------------------------------------------------------- Comment By: Tony Meyer (anadelonbrin) Date: 2003-02-05 14:17 Message: Logged In: YES user_id=552329 Agreed that it is not necessary. ---------------------------------------------------------------------- Comment By: Mark Hammond (mhammond) Date: 2003-02-05 14:11 Message: Logged In: YES user_id=14198 Yep, I see that makring as read could be useful in that they have been reviewed, but then I would expect Outlook's normal mechanism to still work and mark it read. I have my preview pane mark as read after 2 seconds :) Re the INI file - my problem is that the GUI needs to modify these options, and I don't see how it is trivial to keep the fairly "free-form" INI file format supported by configparser, while only writing out certain elements and not others and also keeping comments etc intact. I'll make a deal - help me with the options problem, and I will give you 5 free option . Let's take it to email... ---------------------------------------------------------------------- Comment By: Tony Meyer (anadelonbrin) Date: 2003-02-05 13:56 Message: Logged In: YES user_id=552329 My reasoning was that if the user manually selects to delete it as spam, then it is as good as read. Those that are moving via the filter have not been read. Personally I still wade through the filtered spam to check it for false positives, and mark the messages as read as I go (so that the 'unread' display is the number of messages I haven't checked). If I choose delete as spam, I then have to go to the spam folder and mark it as read. In any case, no big deal if you disagree, it was just a thought :) Re: the ini file: looking at the ini, it doesn't seem to have anything that couldn't be in the GUI. Most of it would probably fit in the "advanced" dialog. It would probably be good if the ini was only for 'beta' options - anything that is for public use should be in the GUI. And if a 'beta' option moves to 'public', then it doesn't matter (much) if it breaks, because those using beta options should be upgrading anyway. Moving the existing settings (most of which should be exposed I think) would mean breaking existing code, but maybe just this once? Maybe this discussion should move to the list? (maybe I should have posted this there originally?) ---------------------------------------------------------------------- Comment By: Mark Hammond (mhammond) Date: 2003-02-05 13:41 Message: Logged In: YES user_id=14198 I'm not too sure this should happen unless the filter also marks the items as read - otherwise you still end up with many spam in the spam folder unread, and only the ones you move manually marked as read. I'm also kinda stuck about what to do with "options". Currently, options managed by the GUI are in a pickle, while other options are in the .ini file. I don't object to having new, outlook specific options in the INI file, but I do object to all our existing code breaking should we decide later to move this option into the GUI. ---------------------------------------------------------------------- Comment By: Tony Meyer (anadelonbrin) Date: 2003-02-05 13:31 Message: Logged In: YES user_id=552329 And who else to decide on this, but Mark :) ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498106&aid=680629&group_id=61702 From noreply at sourceforge.net Fri Feb 7 03:38:17 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Fri Feb 7 10:13:02 2003 Subject: [Spambayes] [ spambayes-Feature Requests-680629 ] Outlook plugin: Delete as spam marks as read Message-ID: Feature Requests item #680629, was opened at 2003-02-04 18:30 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498106&aid=680629&group_id=61702 Category: None Group: None Status: Open Priority: 1 Submitted By: Tony Meyer (anadelonbrin) Assigned to: Mark Hammond (mhammond) Summary: Outlook plugin: Delete as spam marks as read Initial Comment: Personally I think it would be nice if the "delete as spam" button marked the mail item as read. Note that I'm not saying that mail that is filtered as spam should be marked as read - it shouldn't (by default). If others agree, this would be a nice addition. Perhaps as an option in the prefs. ---------------------------------------------------------------------- Comment By: Piers Haken (piersh) Date: 2003-02-07 03:38 Message: Logged In: YES user_id=10551 i don't care if you do this or not (since spambayes catches all my spam ;-) ), but please don't mark any automatically- filtered spam as 'read' - it would be a pain to check for FPs if you did. thx. ---------------------------------------------------------------------- Comment By: Mark Hammond (mhammond) Date: 2003-02-07 03:05 Message: Logged In: YES user_id=14198 Fair enough :) ---------------------------------------------------------------------- Comment By: Paul Moore (pmoore) Date: 2003-02-07 01:09 Message: Logged In: YES user_id=113328 I'd like the "Mark as read" option. Most unsures and false negatives which are spam, I can identify by subject, and hence I don't open (and I don't use the preview pane). But it's not crucial - Ctrl-Q does a very quick "Mark as read" anyway... ---------------------------------------------------------------------- Comment By: Tony Meyer (anadelonbrin) Date: 2003-02-04 19:17 Message: Logged In: YES user_id=552329 Agreed that it is not necessary. ---------------------------------------------------------------------- Comment By: Mark Hammond (mhammond) Date: 2003-02-04 19:11 Message: Logged In: YES user_id=14198 Yep, I see that makring as read could be useful in that they have been reviewed, but then I would expect Outlook's normal mechanism to still work and mark it read. I have my preview pane mark as read after 2 seconds :) Re the INI file - my problem is that the GUI needs to modify these options, and I don't see how it is trivial to keep the fairly "free-form" INI file format supported by configparser, while only writing out certain elements and not others and also keeping comments etc intact. I'll make a deal - help me with the options problem, and I will give you 5 free option . Let's take it to email... ---------------------------------------------------------------------- Comment By: Tony Meyer (anadelonbrin) Date: 2003-02-04 18:56 Message: Logged In: YES user_id=552329 My reasoning was that if the user manually selects to delete it as spam, then it is as good as read. Those that are moving via the filter have not been read. Personally I still wade through the filtered spam to check it for false positives, and mark the messages as read as I go (so that the 'unread' display is the number of messages I haven't checked). If I choose delete as spam, I then have to go to the spam folder and mark it as read. In any case, no big deal if you disagree, it was just a thought :) Re: the ini file: looking at the ini, it doesn't seem to have anything that couldn't be in the GUI. Most of it would probably fit in the "advanced" dialog. It would probably be good if the ini was only for 'beta' options - anything that is for public use should be in the GUI. And if a 'beta' option moves to 'public', then it doesn't matter (much) if it breaks, because those using beta options should be upgrading anyway. Moving the existing settings (most of which should be exposed I think) would mean breaking existing code, but maybe just this once? Maybe this discussion should move to the list? (maybe I should have posted this there originally?) ---------------------------------------------------------------------- Comment By: Mark Hammond (mhammond) Date: 2003-02-04 18:41 Message: Logged In: YES user_id=14198 I'm not too sure this should happen unless the filter also marks the items as read - otherwise you still end up with many spam in the spam folder unread, and only the ones you move manually marked as read. I'm also kinda stuck about what to do with "options". Currently, options managed by the GUI are in a pickle, while other options are in the .ini file. I don't object to having new, outlook specific options in the INI file, but I do object to all our existing code breaking should we decide later to move this option into the GUI. ---------------------------------------------------------------------- Comment By: Tony Meyer (anadelonbrin) Date: 2003-02-04 18:31 Message: Logged In: YES user_id=552329 And who else to decide on this, but Mark :) ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498106&aid=680629&group_id=61702 From skip at pobox.com Fri Feb 7 09:22:18 2003 From: skip at pobox.com (Skip Montanaro) Date: Fri Feb 7 10:22:37 2003 Subject: [Spambayes] Mailing list In-Reply-To: <002401c2ce96$ee572520$530f8490@eden> References: <002401c2ce96$ee572520$530f8490@eden> Message-ID: <15939.53034.721387.213271@montanaro.dyndns.org> >>>>> "Mark" == Mark Hammond writes: >> With all the awareness that the LinuxJournal article will bring, it >> would be safe to create a "spambayes users" mailing list and >> advertize for it on the main page of the >> http://spambayes.sourceforge.net site. >> >> Don't you think ??? Mark> Actually, yet another thing the timbot taught me was not to create Mark> a mailing list until there is actual demand, rather than an Mark> expected demand. Amen. In any case, it would be easier to let people subscribe to the spambayes list, then if developers' topics get overwhelmed, create a spambayes-devel list for that purpose. Skip From noreply at sourceforge.net Fri Feb 7 07:58:21 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Fri Feb 7 10:53:52 2003 Subject: [Spambayes] [ spambayes-Feature Requests-679107 ] Outlook Express can't filter on headers Message-ID: Feature Requests item #679107, was opened at 2003-02-02 10:46 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498106&aid=679107&group_id=61702 Category: None Group: None Status: Open Priority: 5 Submitted By: Andrew Wilkinson (andrew_j_w) Assigned to: Nobody/Anonymous (nobody) Summary: Outlook Express can't filter on headers Initial Comment: Outlook Express won't let you filter on the X-Spambayes- Classification header therefore it would be nice to have the option to use another method of marking spam. One possible idea is to add [spam] to beginning of the subject line, or to add the X-Spambayes-Classification to the end of the message body. ---------------------------------------------------------------------- >Comment By: Tim Stone (timstone4) Date: 2003-02-07 09:58 Message: Logged In: YES user_id=645698 This is a bit of a difficulty. The best I can think to do is to add a specific string to To: header, which can be tested with the OE message rules. We wouldn't want to use the CC: header, because this would break 'Reply All' functionality. Adding stuff to the message body is just too messy and prone to error, particularly for multipart messages. However, adding to the To: header could be a bit error prone as well... we'll have to experiment a bit with this. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498106&aid=679107&group_id=61702 From tim at fourstonesExpressions.com Fri Feb 7 10:42:42 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Fri Feb 7 11:42:49 2003 Subject: [Spambayes] Outlook Express and pop3proxy Message-ID: <3YBAVUMH1UWSKF06VZWB0D8B6V3GE.3e43e202@myst> It turns out that outlook express is unable to examine headers, except for to:, cc:, and subject:. OE is a relatively well used client, not nearly as common as outlook, but it comes as a default with every winduhz installation, so everyone has it. We should probably support it somehow. I propose that we include classification in the to: header. This will not break any reply functionality, and will not insert information into an area of the mail that could be inadvertently seen by someone else or spoofed by a spammer. The second consideration is particularly important, I think. If we have a to: header of: To: tim@fourstonesExpressions.com, richie@entrian.com and we change that header to: To: spamtim@fourstonesExpressions.com, richie@entrian.com then OE can be configured to look for 'spam' and filter accordingly. However, if the spammer puts 'ham' in the to: field (for example), the mail will bounce. It would be relatively simple to do this. We could have a boolean option like 'pop3proxy-notate-to:' that controls whether or not the to: header is marked up, which could be configured from the configuration page. Does this solve the problem? c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org From popiel at wolfskeep.com Fri Feb 7 08:45:23 2003 From: popiel at wolfskeep.com (T. Alexander Popiel) Date: Fri Feb 7 11:45:27 2003 Subject: [Spambayes] Msg class broken? Message-ID: <20030207164523.C67922DEA3@cashew.wolfskeep.com> Resending, because this one seems to have gone into the bit-bucket, too... I'm trying to do a bit more testing (*gasp*), but I'm having a bit of difficulty: it seems that the tokenizer doesn't like being given a simple string anymore, as is done in the Msg class in msgs.py. If I'm reading things right, this breaks all of the automated testing tools. Have a traceback: Traceback (most recent call last): File "testtools/Continuous.py", line 293, in ? main() File "testtools/Continuous.py", line 254, in main tests[j].predict([msg], isspam) File "testtools/Continuous.py", line 94, in predict prob = guess(example) File "/home/cashew/popiel/spambayes/testing/spambayes/spambayes/classifier.py", line 217, in chi2_spamprob clues = self._getclues(wordstream) File "/home/cashew/popiel/spambayes/testing/spambayes/spambayes/classifier.py", line 436, in _getclues for word in Set(wordstream): File "/home/cashew/popiel/spambayes/testing/spambayes/spambayes/compatsets.py", line 374, in __init__ self._update(iterable) File "/home/cashew/popiel/spambayes/testing/spambayes/spambayes/compatsets.py", line 333, in _update for element in it: File "/home/cashew/popiel/spambayes/testing/spambayes/spambayes/tokenizer.py", line 1052, in tokenize for tok in self.tokenize_headers(msg): File "/home/cashew/popiel/spambayes/testing/spambayes/spambayes/tokenizer.py", line 1063, in tokenize_headers for w in crack_content_xyz(x): File "/home/cashew/popiel/spambayes/testing/spambayes/spambayes/tokenizer.py", line 791, in crack_content_xyz yield 'content-type:' + msg.get_content_type() AttributeError: Message instance has no attribute 'get_content_type' Please ignore the top three lines of the trace; I'm building my own driver for testing with incremental training after each message. (What I'm trying to do in the big picture is get graphs of how the error rates drop off over time with various training modes.) Anyway, it looks like either msgs.py needs to be updated to pass in email.Message.Message objects, or tokenizer.py needs to relearn how to accept raw strings. Am I reading this right? This seems odd since tokenizer does seem to try to convert the string to a Message via the auspices of mboxutils... help? - Alex From skip at pobox.com Fri Feb 7 10:53:00 2003 From: skip at pobox.com (Skip Montanaro) Date: Fri Feb 7 11:53:19 2003 Subject: [Spambayes] Msg class broken? In-Reply-To: <20030207164523.C67922DEA3@cashew.wolfskeep.com> References: <20030207164523.C67922DEA3@cashew.wolfskeep.com> Message-ID: <15939.58476.716108.657064@montanaro.dyndns.org> Alex> AttributeError: Message instance has no attribute 'get_content_type' Alex, Check to see that you are using a recent enough version of the email package. Skip From popiel at wolfskeep.com Fri Feb 7 09:44:54 2003 From: popiel at wolfskeep.com (T. Alexander Popiel) Date: Fri Feb 7 12:44:59 2003 Subject: [Spambayes] Msg class broken? In-Reply-To: Message from Skip Montanaro <15939.58476.716108.657064@montanaro.dyndns.org> References: <20030207164523.C67922DEA3@cashew.wolfskeep.com> <15939.58476.716108.657064@montanaro.dyndns.org> Message-ID: <20030207174454.CA84C2DEA3@cashew.wolfskeep.com> In message: <15939.58476.716108.657064@montanaro.dyndns.org> Skip Montanaro writes: > Alex> AttributeError: Message instance has no attribute 'get_content_type' > >Alex, > >Check to see that you are using a recent enough version of the email >package. Hrm. I'm using email-2.4.3, lemme see if there's anything more recent... nope, doesn't seem to be, short of pulling from sf CVS. When I look in the email package source, I do see the get_content_type method... - Alex, quite confused, and instrumenting for more data From popiel at wolfskeep.com Fri Feb 7 10:40:49 2003 From: popiel at wolfskeep.com (T. Alexander Popiel) Date: Fri Feb 7 13:40:52 2003 Subject: [Spambayes] Msg class broken? In-Reply-To: Message from "T. Alexander Popiel" <20030207174454.CA84C2DEA3@cashew.wolfskeep.com> References: <20030207164523.C67922DEA3@cashew.wolfskeep.com> <15939.58476.716108.657064@montanaro.dyndns.org> <20030207174454.CA84C2DEA3@cashew.wolfskeep.com> Message-ID: <20030207184049.6F52D2DEA3@cashew.wolfskeep.com> In message: <20030207174454.CA84C2DEA3@cashew.wolfskeep.com> "T. Alexander Popiel" writes: >In message: <15939.58476.716108.657064@montanaro.dyndns.org> > Skip Montanaro writes: > >> Alex> AttributeError: Message instance has no attribute 'get_content_type' >> >>Alex, >> >>Check to see that you are using a recent enough version of the email >>package. > >Hrm. I'm using email-2.4.3, lemme see if there's anything more recent... >nope, doesn't seem to be, short of pulling from sf CVS. When I look in >the email package source, I do see the get_content_type method... Argh. I had two different versions of the email package in my search path. It was grabbing the wrong one. Fixed. I feel dumb now. - Alex From david at theresistance.net Fri Feb 7 21:28:10 2003 From: david at theresistance.net (David Shaw) Date: Sat Feb 8 00:31:03 2003 Subject: [Spambayes] Error on review page In-Reply-To: <20030207184049.6F52D2DEA3@cashew.wolfskeep.com> Message-ID: I've been using spambayes for a few weeks and have been very impressed. I have even started a real GUI to emulate the functionality of the Web interface. Anyway, on to my problem. I got a lot of mail today (partially from joining this list), and when I go to the review screen now I see this: 500 Server error Traceback (most recent call last): File "spambayes/Dibbler.py", line 398, in found_terminator getattr(plugin, name)(**params) File "./pop3proxy.py", line 916, in onReview message = mboxutils.get_message(cachedMessage.getSubstance()) File "spambayes/Corpus.py", line 344, in getSubstance return self.hdrtxt + self.payload File "spambayes/Corpus.py", line 291, in __getattr__ raise AttributeError, attributeName AttributeError: hdrtxt I suspect that the problem is one of the mbox files where messages are cached got corrupted. Any thoughts on where to chase this down? Also, assuming I do, should there be some more robustness in the code to handle such problems if they do happen? From david at theresistance.net Fri Feb 7 22:54:16 2003 From: david at theresistance.net (David Shaw) Date: Sat Feb 8 00:31:05 2003 Subject: [Spambayes] Another Error on review page In-Reply-To: Message-ID: When trying to classify a message, I got the following error. Is there any way to run the pop proxy interactively so you can "poke" it to see what data lives at self.nham that is causing the problem? I use ZEO to do this in Zope and it's very helpful. Traceback (most recent call last): File "spambayes/Dibbler.py", line 398, in found_terminator getattr(plugin, name)(**params) File "./pop3proxy.py", line 865, in onReview targetCorpus.takeMessage(id, state.unknownCorpus) File "spambayes/Corpus.py", line 201, in takeMessage self.addMessage(msg) File "spambayes/FileCorpus.py", line 143, in addMessage Corpus.Corpus.addMessage(self, message) File "spambayes/Corpus.py", line 136, in addMessage obs.onAddMessage(message) File "spambayes/storage.py", line 219, in onAddMessage self.train(message) File "spambayes/storage.py", line 227, in train self.bayes.learn(message.tokenize(), self.is_spam) File "spambayes/classifier.py", line 270, in learn self._add_msg(wordstream, is_spam) File "spambayes/classifier.py", line 389, in _add_msg self.nham += 1 TypeError: cannot concatenate 'str' and 'int' objects On a related note, I wonder if there would be benefit in storing in ZODB rather than straight DBM/pickles (ZODB can use bsddb3 or pickles as well). From david at theresistance.net Sat Feb 8 10:52:37 2003 From: david at theresistance.net (David Shaw) Date: Sat Feb 8 10:54:23 2003 Subject: [Spambayes] Another Error on review page In-Reply-To: Message-ID: <588229C8-3B7D-11D7-BF00-000393582EF6@theresistance.net> For those of us who do not have checkin capability (or need it), how do we submit patches? I fixed the problem below in my version by checking that the self.nspam and self.nham values are integers, and if not, convert them. cvs diff spambayes/classifier.py Index: spambayes/classifier.py =================================================================== RCS file: /cvsroot/spambayes/spambayes/spambayes/classifier.py,v retrieving revision 1.3 diff -r1.3 classifier.py 33a34 > import types 387c388,391 < self.nspam += 1 --- > if type(self.nspam) == types.IntType: > self.nspam += 1 > else: > self.nspam = int(self.nspam) + 1 389c393,396 < self.nham += 1 --- > if type(self.nham) == types.IntType: > self.nham += 1 > else: > self.nham = int(self.nham) + 1 On Friday, February 7, 2003, at 10:54 PM, David Shaw wrote: > When trying to classify a message, I got the following error. Is > there any way to run the pop proxy interactively so you can "poke" it > to see what data lives at self.nham that is causing the problem? I > use ZEO to do this in Zope and it's very helpful. > > Traceback (most recent call last): > > File "spambayes/Dibbler.py", line 398, in found_terminator > getattr(plugin, name)(**params) > > File "./pop3proxy.py", line 865, in onReview > targetCorpus.takeMessage(id, state.unknownCorpus) > > File "spambayes/Corpus.py", line 201, in takeMessage > self.addMessage(msg) > > File "spambayes/FileCorpus.py", line 143, in addMessage > Corpus.Corpus.addMessage(self, message) > > File "spambayes/Corpus.py", line 136, in addMessage > obs.onAddMessage(message) > > File "spambayes/storage.py", line 219, in onAddMessage > self.train(message) > > File "spambayes/storage.py", line 227, in train > self.bayes.learn(message.tokenize(), self.is_spam) > > File "spambayes/classifier.py", line 270, in learn > self._add_msg(wordstream, is_spam) > > File "spambayes/classifier.py", line 389, in _add_msg > self.nham += 1 > > TypeError: cannot concatenate 'str' and 'int' objects > > On a related note, I wonder if there would be benefit in storing in > ZODB rather than straight DBM/pickles (ZODB can use bsddb3 or pickles > as well). > _______________________________________________ > Spambayes mailing list > Spambayes@python.org > http://mail.python.org/mailman/listinfo/spambayes > From tim at fourstonesExpressions.com Sat Feb 8 11:21:17 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Sat Feb 8 12:21:29 2003 Subject: [Spambayes] Another Error on review page In-Reply-To: <588229C8-3B7D-11D7-BF00-000393582EF6@theresistance.net> Message-ID: <41ZUNL3V5Z3XB065712YBAHPJCONB9.3e453c8d@myst> 2/8/2003 9:52:37 AM, David Shaw wrote: >For those of us who do not have checkin capability (or need it), how do >we submit patches? I fixed the problem below in my version by checking >that the self.nspam and self.nham values are integers, and if not, >convert them. I've checked in a fix for this problem. >> On a related note, I wonder if there would be benefit in storing in >> ZODB rather than straight DBM/pickles (ZODB can use bsddb3 or pickles >> as well). There have been endless debates on this and related subjects. We're all weary... c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org From tim.one at comcast.net Sat Feb 8 12:50:03 2003 From: tim.one at comcast.net (Tim Peters) Date: Sat Feb 8 12:51:22 2003 Subject: [Spambayes] Another Error on review page In-Reply-To: <588229C8-3B7D-11D7-BF00-000393582EF6@theresistance.net> Message-ID: [David Shaw] > ... > On a related note, I wonder if there would be benefit in storing in > ZODB rather than straight DBM/pickles (ZODB can use bsddb3 or pickles > as well). Jeremy Hylton is ZODB's chief developer these days, and this project's pspam directory contains his code to hook spambayes up to ZODB3 and a ZEO server. I don't think he's done any development on it lately, but that could be because it's already perfect . Beyond that, this is a Python project: if there's a choice among X, Y, and Z, fans of each will eventually supply bindings for each. There's no real need to insist on a specific database that all must use. For my own use, I'm happiest skipping database layers entirely, simply pickling in-memory dicts across sessions. From tim.one at comcast.net Sat Feb 8 14:08:09 2003 From: tim.one at comcast.net (Tim Peters) Date: Sat Feb 8 14:10:53 2003 Subject: [Spambayes] Msg class broken? In-Reply-To: <20030207184049.6F52D2DEA3@cashew.wolfskeep.com> Message-ID: [T. Alexander Popiel] > ... > Argh. I had two different versions of the email package in my search > path. It was grabbing the wrong one. Fixed. I feel dumb now. I can cure that: Alex, you're a moron. See? If someone else says it, you stop feeling stupid and get some healthy anger going instead. Just so long as you don't go on to believe I meant it, everyone wins . some-days-i-wake-up-in-the-wrong-life-ly y'rs - tim From tim at fourstonesExpressions.com Sat Feb 8 17:32:37 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Sat Feb 8 18:32:47 2003 Subject: [Spambayes] Msg class broken? Message-ID: <1V83MH7XTXTB7IFD961H1W3W07B0J.3e459395@myst> Another item on my list of prereqs to be checked by the installer... -TimS 2/8/2003 1:08:09 PM, Tim Peters wrote: >[T. Alexander Popiel] >> ... >> Argh. I had two different versions of the email package in my search >> path. It was grabbing the wrong one. Fixed. I feel dumb now. > >I can cure that: Alex, you're a moron. See? If someone else says it, you >stop feeling stupid and get some healthy anger going instead. Just so long >as you don't go on to believe I meant it, everyone wins . > >some-days-i-wake-up-in-the-wrong-life-ly y'rs - tim > > >_______________________________________________ >Spambayes mailing list >Spambayes@python.org >http://mail.python.org/mailman/listinfo/spambayes > > c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org From T.A.Meyer at massey.ac.nz Sun Feb 9 12:36:35 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Sat Feb 8 18:37:11 2003 Subject: [Spambayes] Another Error on review page Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1318D486@its-xchg4.massey.ac.nz> > For those of us who do not have checkin capability (or need > it), how do we submit patches? Go to the Sourceforge project page () and use the "submit new" option on either the feature request (RFE), bugs, or patches pages. It lets you attach a file as well. =Tony Meyer From david at theresistance.net Sat Feb 8 18:39:41 2003 From: david at theresistance.net (David Shaw) Date: Sat Feb 8 18:41:12 2003 Subject: [Spambayes] Another Error on review page In-Reply-To: <41ZUNL3V5Z3XB065712YBAHPJCONB9.3e453c8d@myst> Message-ID: <985B4360-3BBE-11D7-BF00-000393582EF6@theresistance.net> On Saturday, February 8, 2003, at 12:21 PM, Tim Stone - Four Stones Expressions wrote: > >>> On a related note, I wonder if there would be benefit in storing in >>> ZODB rather than straight DBM/pickles (ZODB can use bsddb3 or pickles >>> as well). > > There have been endless debates on this and related subjects. We're > all > weary... > I apologize for beating a dead horse. I read a couple of months worth of the archive when joining, but I've yet to read far enough back to see where this topic was discussed. Thanks for checking in a fix -- I have a couple of others if you want 'em, all related to the fact that MacOS X doesn't come with Python 2.3. From noreply at sourceforge.net Sat Feb 8 17:28:55 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Sat Feb 8 21:34:14 2003 Subject: [Spambayes] [ spambayes-Bugs-680158 ] Outlook addin cannot create new database Message-ID: Bugs item #680158, was opened at 2003-02-05 01:53 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=680158&group_id=61702 Category: None Group: None Status: Open Resolution: None Priority: 5 Submitted By: Duncan Booth (duncanb) Assigned to: Nobody/Anonymous (nobody) Summary: Outlook addin cannot create new database Initial Comment: Outlook2000\manager.py, revision 1.42 contains this code: def new_bayes(self): # Just delete the file and do an "open" try: os.unlink(self.bayes_filename) except IOError, e: if e.errno != errno.ENOENT: raise return self.open_bayes() Python 2.2 under windows raises OSError when os.unlink fails, so this code should be: def new_bayes(self): # Just delete the file and do an "open" try: os.unlink(self.bayes_filename) except (OSError,IOError), e: if e.errno != errno.ENOENT: raise return self.open_bayes() ---------------------------------------------------------------------- Comment By: Tony Meyer (anadelonbrin) Date: 2003-02-09 14:28 Message: Logged In: YES user_id=552329 I get this too, and the fix works for me. Here's a complete trace from running addin.py (fresh CVS) to the error when launching Outlook. Outlook Spam Addin module loading SpamAddin - Connecting to Outlook Created new configuration file 'D:\spambayes\Outlook2000 \default_configuration.pck' Loaded bayes database from 'D:\spambayes\Outlook2000 \default_bayes_database.pck' Either bayes database or message database is missing - creating new Traceback (most recent call last): File "D:\Python22\lib\site-packages\win32com\universal.py", line 150, in dispatch retVal = ob._InvokeEx_(meth.dispid, 0, pythoncom.DISPATCH_METHOD, args, None, None) File "D:\Python22\lib\site- packages\win32com\server\policy.py", line 322, in _InvokeEx_ return self._invokeex_(dispid, lcid, wFlags, args, kwargs, serviceProvider) File "D:\Python22\lib\site- packages\win32com\server\policy.py", line 562, in _invokeex_ return DesignatedWrapPolicy._invokeex_( self, dispid, lcid, wFlags, args, kwArgs, serviceProvider) File "D:\Python22\lib\site- packages\win32com\server\policy.py", line 510, in _invokeex_ return apply(func, args) File "D:\spambayes\Outlook2000\addin.py", line 615, in OnConnection self.manager = manager.GetManager(application) File "D:\spambayes\Outlook2000\manager.py", line 419, in GetManager _mgr = BayesManager(outlook=outlook, verbose=verbose) File "D:\spambayes\Outlook2000\manager.py", line 158, in __init__ self.LoadBayes() File "D:\spambayes\Outlook2000\manager.py", line 265, in LoadBayes self.InitNewBayes() File "D:\spambayes\Outlook2000\manager.py", line 314, in InitNewBayes self.bayes = self.db_manager.new_bayes() File "D:\spambayes\Outlook2000\manager.py", line 87, in new_bayes os.unlink(self.bayes_filename) exceptions.OSError: [Errno 2] No such file or directory: 'D:\spambayes\Outlook2000 \default_bayes_database.pck' ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=680158&group_id=61702 From noreply at sourceforge.net Sat Feb 8 18:36:02 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Sat Feb 8 21:34:15 2003 Subject: [Spambayes] [ spambayes-Feature Requests-679107 ] Outlook Express can't filter on headers Message-ID: Feature Requests item #679107, was opened at 2003-02-02 10:46 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498106&aid=679107&group_id=61702 Category: None Group: None >Status: Closed Priority: 5 Submitted By: Andrew Wilkinson (andrew_j_w) >Assigned to: Tim Stone (timstone4) Summary: Outlook Express can't filter on headers Initial Comment: Outlook Express won't let you filter on the X-Spambayes- Classification header therefore it would be nice to have the option to use another method of marking spam. One possible idea is to add [spam] to beginning of the subject line, or to add the X-Spambayes-Classification to the end of the message body. ---------------------------------------------------------------------- >Comment By: Tim Stone (timstone4) Date: 2003-02-08 20:36 Message: Logged In: YES user_id=645698 Added a configuration option to add classification to recipient list, which can be tested by Outlook Express mail rules. See Option Configuration page in the pop3proxy user interface. ---------------------------------------------------------------------- Comment By: Tim Stone (timstone4) Date: 2003-02-07 09:58 Message: Logged In: YES user_id=645698 This is a bit of a difficulty. The best I can think to do is to add a specific string to To: header, which can be tested with the OE message rules. We wouldn't want to use the CC: header, because this would break 'Reply All' functionality. Adding stuff to the message body is just too messy and prone to error, particularly for multipart messages. However, adding to the To: header could be a bit error prone as well... we'll have to experiment a bit with this. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498106&aid=679107&group_id=61702 From noreply at sourceforge.net Sat Feb 8 18:36:23 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Sat Feb 8 21:34:16 2003 Subject: [Spambayes] [ spambayes-Feature Requests-616944 ] Mozilla Mail integration Message-ID: Feature Requests item #616944, was opened at 2002-10-01 04:31 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498106&aid=616944&group_id=61702 Category: None Group: None >Status: Closed Priority: 5 Submitted By: Sinchi Pacharuraq (sinchi) Assigned to: Nobody/Anonymous (nobody) Summary: Mozilla Mail integration Initial Comment: Integration with Mozilla Mail client ---------------------------------------------------------------------- >Comment By: Tim Stone (timstone4) Date: 2003-02-08 20:36 Message: Logged In: YES user_id=645698 Mozilla team is doing this. ---------------------------------------------------------------------- Comment By: Tim Stone (timstone4) Date: 2003-02-06 23:29 Message: Logged In: YES user_id=645698 To the best of my knowledge, a bayesian filter based on spambayes is currently being integrated into the mozilla mailer. It is in beta right now, and I think is scheduled to be included in the next release. ---------------------------------------------------------------------- Comment By: Tony Meyer (anadelonbrin) Date: 2003-02-04 21:04 Message: Logged In: YES user_id=552329 This is pretty old now, and it could probably be closed. You can add such a filter in Mozilla - like this: In the Mail window, select: Tools -> Message Filters -> New When you get to the new filter pane, name the filter X- Hammie-Disposition Filter Criteria -> Customize -> New Message Header: Add the following: In the box type X-Hammie-Disposition click Add, then OK. Select the following as the Filter Criteria: X-Hammie-Disposition contains Yes Under Perform this action, select a destination folder ---------------------------------------------------------------------- Comment By: Richie Hindle (richiehindle) Date: 2002-10-02 08:33 Message: Logged In: YES user_id=85414 I'm no expert on how Mozilla filters work... can you add a filter that says "If a message contains an X-Hammie-Disposition header whose value starts with Yes then "? If so, you can use either hammie.py (as part of your unix mail delivery system) or pop3proxy.py (on either a server machine or your own client machine). Both of these add an X-Hammie-Disposition header, with which you can filter your messages. ---------------------------------------------------------------------- Comment By: Sinchi Pacharuraq (sinchi) Date: 2002-10-02 04:53 Message: Logged In: YES user_id=621182 I just want to have this anti-spam filter built in Mozilla message filters. For example, user might activate this filter to delete spam messages from inbox or to move it to special folder. ---------------------------------------------------------------------- Comment By: Skip Montanaro (montanaro) Date: 2002-10-01 11:04 Message: Logged In: YES user_id=44345 ummm.... a bit short on detail/description. What precisely do you mean by "Mozilla Mail integration"? Can you describe what you would like to see feature-wise? Note that no other mail system integration has been attempted at this point with the exception that I believe the hammie script works with procmail. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498106&aid=616944&group_id=61702 From noreply at sourceforge.net Sat Feb 8 20:00:07 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Sat Feb 8 23:03:24 2003 Subject: [Spambayes] [ spambayes-Bugs-683250 ] Error in outlook/readme Message-ID: Bugs item #683250, was opened at 2003-02-09 17:00 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=683250&group_id=61702 Category: None Group: None Status: Open Resolution: None Priority: 5 Submitted By: Tony Meyer (anadelonbrin) Assigned to: Nobody/Anonymous (nobody) Summary: Error in outlook/readme Initial Comment: Pretty minor, but in the readme.txt in the outlook folder, under known bugs this is listed: * Filtering an Exchange Server public store appears to not work (is this still true?) This is *not* still true :) so this line could go away. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=683250&group_id=61702 From noreply at sourceforge.net Sat Feb 8 20:01:32 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Sat Feb 8 23:03:26 2003 Subject: [Spambayes] [ spambayes-Bugs-683250 ] Error in outlook/readme Message-ID: Bugs item #683250, was opened at 2003-02-09 17:00 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=683250&group_id=61702 Category: None Group: None Status: Open Resolution: None Priority: 5 Submitted By: Tony Meyer (anadelonbrin) >Assigned to: Mark Hammond (mhammond) Summary: Error in outlook/readme Initial Comment: Pretty minor, but in the readme.txt in the outlook folder, under known bugs this is listed: * Filtering an Exchange Server public store appears to not work (is this still true?) This is *not* still true :) so this line could go away. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=683250&group_id=61702 From popiel at wolfskeep.com Sat Feb 8 22:03:21 2003 From: popiel at wolfskeep.com (T. Alexander Popiel) Date: Sun Feb 9 01:03:26 2003 Subject: [Spambayes] Msg class broken? In-Reply-To: Message from Tim Peters References: Message-ID: <20030209060321.B36682DEAD@cashew.wolfskeep.com> In message: Tim Peters writes: >[T. Alexander Popiel] >> ... >> Argh. I had two different versions of the email package in my search >> path. It was grabbing the wrong one. Fixed. I feel dumb now. > >I can cure that: Alex, you're a moron. See? If someone else says it, you >stop feeling stupid and get some healthy anger going instead. Just so long >as you don't go on to believe I meant it, everyone wins . Why thanks, Uncle Timmy! I knew I could count on you to help me out. ;-) On the other hand, I'm now having a small bit of trouble with my graphs; spambayes is too darn good. On my real data, in less than two days training, the error rate is in the noise floor. I haven't tried all the different training methods discussed yet, but it's kinda hard to differentiate when the filter just doesn't screw up. - Alex From mhammond at skippinet.com.au Sun Feb 9 19:03:04 2003 From: mhammond at skippinet.com.au (Mark Hammond) Date: Sun Feb 9 03:03:35 2003 Subject: [Spambayes] Msg class broken? In-Reply-To: <20030209060321.B36682DEAD@cashew.wolfskeep.com> Message-ID: <006c01c2d011$acf230d0$530f8490@eden> [Alex] > On the other hand, I'm now having a small bit of trouble with > my graphs; > spambayes is too darn good. On my real data, in less than two days > training, the error rate is in the noise floor. I haven't tried all > the different training methods discussed yet, but it's kinda hard to > differentiate when the filter just doesn't screw up. Can you post this anyway? I'm particularly interested in this, as I feel it is the roughest edge left on the outlook client - that initial training period. Thanks, Mark. From noreply at sourceforge.net Sun Feb 9 00:07:09 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Sun Feb 9 03:21:54 2003 Subject: [Spambayes] [ spambayes-Feature Requests-616944 ] Mozilla Mail integration Message-ID: Feature Requests item #616944, was opened at 2002-10-01 19:31 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498106&aid=616944&group_id=61702 Category: None Group: None Status: Closed Priority: 5 Submitted By: Sinchi Pacharuraq (sinchi) Assigned to: Nobody/Anonymous (nobody) Summary: Mozilla Mail integration Initial Comment: Integration with Mozilla Mail client ---------------------------------------------------------------------- >Comment By: Mark Hammond (mhammond) Date: 2003-02-09 19:07 Message: Logged In: YES user_id=14198 Is Mozilla doing a "spambayes", or their own? If not ours, it would at least be interesting to extract their engine into our test suite, and measure the error rates. I'd like to help on this, as it would breathe some life back into pyxpcom. ---------------------------------------------------------------------- Comment By: Tim Stone (timstone4) Date: 2003-02-09 13:36 Message: Logged In: YES user_id=645698 Mozilla team is doing this. ---------------------------------------------------------------------- Comment By: Tim Stone (timstone4) Date: 2003-02-07 16:29 Message: Logged In: YES user_id=645698 To the best of my knowledge, a bayesian filter based on spambayes is currently being integrated into the mozilla mailer. It is in beta right now, and I think is scheduled to be included in the next release. ---------------------------------------------------------------------- Comment By: Tony Meyer (anadelonbrin) Date: 2003-02-05 14:04 Message: Logged In: YES user_id=552329 This is pretty old now, and it could probably be closed. You can add such a filter in Mozilla - like this: In the Mail window, select: Tools -> Message Filters -> New When you get to the new filter pane, name the filter X- Hammie-Disposition Filter Criteria -> Customize -> New Message Header: Add the following: In the box type X-Hammie-Disposition click Add, then OK. Select the following as the Filter Criteria: X-Hammie-Disposition contains Yes Under Perform this action, select a destination folder ---------------------------------------------------------------------- Comment By: Richie Hindle (richiehindle) Date: 2002-10-02 23:33 Message: Logged In: YES user_id=85414 I'm no expert on how Mozilla filters work... can you add a filter that says "If a message contains an X-Hammie-Disposition header whose value starts with Yes then "? If so, you can use either hammie.py (as part of your unix mail delivery system) or pop3proxy.py (on either a server machine or your own client machine). Both of these add an X-Hammie-Disposition header, with which you can filter your messages. ---------------------------------------------------------------------- Comment By: Sinchi Pacharuraq (sinchi) Date: 2002-10-02 19:53 Message: Logged In: YES user_id=621182 I just want to have this anti-spam filter built in Mozilla message filters. For example, user might activate this filter to delete spam messages from inbox or to move it to special folder. ---------------------------------------------------------------------- Comment By: Skip Montanaro (montanaro) Date: 2002-10-02 02:04 Message: Logged In: YES user_id=44345 ummm.... a bit short on detail/description. What precisely do you mean by "Mozilla Mail integration"? Can you describe what you would like to see feature-wise? Note that no other mail system integration has been attempted at this point with the exception that I believe the hammie script works with procmail. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498106&aid=616944&group_id=61702 From piersh at friskit.com Sun Feb 9 01:55:40 2003 From: piersh at friskit.com (Piers Haken) Date: Sun Feb 9 04:38:09 2003 Subject: [Spambayes] Msg class broken? Message-ID: <9891913C5BFE87429D71E37F08210CB9297571@zeus.sfhq.friskit.com> Great!!! Does that mean we're going to have a dialog? +- | Are you a moron? [x] Yes [ ] No | Click next to continue... +- Piers. > -----Original Message----- > From: Tim Stone - Four Stones Expressions > [mailto:tim@fourstonesExpressions.com] > Sent: Saturday, February 08, 2003 3:33 PM > To: Spambayes > Subject: Re: RE: [Spambayes] Msg class broken? > > > Another item on my list of prereqs to be checked by the > installer... -TimS > > 2/8/2003 1:08:09 PM, Tim Peters wrote: > > >[T. Alexander Popiel] > >> ... > >> Argh. I had two different versions of the email package > in my search > >> path. It was grabbing the wrong one. Fixed. I feel dumb now. > > > >I can cure that: Alex, you're a moron. See? If someone > else says it, > >you stop feeling stupid and get some healthy anger going > instead. Just > >so long as you don't go on to believe I meant it, everyone > wins . > > > >some-days-i-wake-up-in-the-wrong-life-ly y'rs - tim > > > > > >_______________________________________________ > >Spambayes mailing list > >Spambayes@python.org > http://mail.python.org/mailman/listinfo/spambayes > > > > > > > c'est > moi - TimS > http://www.fourstonesExpressions.com > http://wecanstopspam.org > > > > > > _______________________________________________ > Spambayes mailing list > Spambayes@python.org http://mail.python.org/mailman/listinfo/spambayes > From noreply at sourceforge.net Sun Feb 9 06:34:19 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Sun Feb 9 10:48:50 2003 Subject: [Spambayes] [ spambayes-Feature Requests-616944 ] Mozilla Mail integration Message-ID: Feature Requests item #616944, was opened at 2002-10-01 04:31 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498106&aid=616944&group_id=61702 Category: None Group: None Status: Closed Priority: 5 Submitted By: Sinchi Pacharuraq (sinchi) Assigned to: Nobody/Anonymous (nobody) Summary: Mozilla Mail integration Initial Comment: Integration with Mozilla Mail client ---------------------------------------------------------------------- >Comment By: Tim Stone (timstone4) Date: 2003-02-09 08:34 Message: Logged In: YES user_id=645698 According to the mozilla dudes, they grabbed our code, made a few tweaks (which are probably not well researched imo) and incorporated it. I don't know if they kept it in python, or ported it... ---------------------------------------------------------------------- Comment By: Mark Hammond (mhammond) Date: 2003-02-09 02:07 Message: Logged In: YES user_id=14198 Is Mozilla doing a "spambayes", or their own? If not ours, it would at least be interesting to extract their engine into our test suite, and measure the error rates. I'd like to help on this, as it would breathe some life back into pyxpcom. ---------------------------------------------------------------------- Comment By: Tim Stone (timstone4) Date: 2003-02-08 20:36 Message: Logged In: YES user_id=645698 Mozilla team is doing this. ---------------------------------------------------------------------- Comment By: Tim Stone (timstone4) Date: 2003-02-06 23:29 Message: Logged In: YES user_id=645698 To the best of my knowledge, a bayesian filter based on spambayes is currently being integrated into the mozilla mailer. It is in beta right now, and I think is scheduled to be included in the next release. ---------------------------------------------------------------------- Comment By: Tony Meyer (anadelonbrin) Date: 2003-02-04 21:04 Message: Logged In: YES user_id=552329 This is pretty old now, and it could probably be closed. You can add such a filter in Mozilla - like this: In the Mail window, select: Tools -> Message Filters -> New When you get to the new filter pane, name the filter X- Hammie-Disposition Filter Criteria -> Customize -> New Message Header: Add the following: In the box type X-Hammie-Disposition click Add, then OK. Select the following as the Filter Criteria: X-Hammie-Disposition contains Yes Under Perform this action, select a destination folder ---------------------------------------------------------------------- Comment By: Richie Hindle (richiehindle) Date: 2002-10-02 08:33 Message: Logged In: YES user_id=85414 I'm no expert on how Mozilla filters work... can you add a filter that says "If a message contains an X-Hammie-Disposition header whose value starts with Yes then "? If so, you can use either hammie.py (as part of your unix mail delivery system) or pop3proxy.py (on either a server machine or your own client machine). Both of these add an X-Hammie-Disposition header, with which you can filter your messages. ---------------------------------------------------------------------- Comment By: Sinchi Pacharuraq (sinchi) Date: 2002-10-02 04:53 Message: Logged In: YES user_id=621182 I just want to have this anti-spam filter built in Mozilla message filters. For example, user might activate this filter to delete spam messages from inbox or to move it to special folder. ---------------------------------------------------------------------- Comment By: Skip Montanaro (montanaro) Date: 2002-10-01 11:04 Message: Logged In: YES user_id=44345 ummm.... a bit short on detail/description. What precisely do you mean by "Mozilla Mail integration"? Can you describe what you would like to see feature-wise? Note that no other mail system integration has been attempted at this point with the exception that I believe the hammie script works with procmail. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498106&aid=616944&group_id=61702 From popiel at wolfskeep.com Sun Feb 9 11:57:51 2003 From: popiel at wolfskeep.com (T. Alexander Popiel) Date: Sun Feb 9 14:57:55 2003 Subject: [Spambayes] Msg class broken? In-Reply-To: Message from "Mark Hammond" <006c01c2d011$acf230d0$530f8490@eden> References: <006c01c2d011$acf230d0$530f8490@eden> Message-ID: <20030209195751.CEBE22DED0@cashew.wolfskeep.com> In message: <006c01c2d011$acf230d0$530f8490@eden> "Mark Hammond" writes: >[Alex] >> On the other hand, I'm now having a small bit of trouble with >> my graphs; >> spambayes is too darn good. On my real data, in less than two days >> training, the error rate is in the noise floor. I haven't tried all >> the different training methods discussed yet, but it's kinda hard to >> differentiate when the filter just doesn't screw up. > >Can you post this anyway? I'm particularly interested in this, as I feel it >is the roughest edge left on the outlook client - that initial training >period. Certainly, once I'm done with tightening up the presentation a bit. I'm also going to try it with significantly smaller and more balanced amounts of data; I suspect most people don't have a 2:1 spam:ham ratio. - Alex From jwilliam at xmission.com Sun Feb 9 20:53:44 2003 From: jwilliam at xmission.com (Jerry Williams) Date: Sun Feb 9 22:55:08 2003 Subject: [Spambayes] bug? Outlook2000 Message-ID: I loaded Python 2.2.2 and win32all-150.exe and spambayes-1.0a2.tar.gz. I tried to just install the Outlook2000 directory and it didn't work so I tried the whole thing and it worked. The problem is that I don't have the little icons next to the buttons. I tried to rename the images directory and look a the log and it complains that it can't find them. So I changed it back, but they still don't show up. I am running Window ME. It would be nice to have a small description on how to uninstall the Outlook2000 plugin. I was thinking that maybe I need to uninstall it and install it again and that might fix it. I also installed it on my Windows 2000 machine and it seems to work fine. That is how I know that I don't have the icons on my WinME machine. Thanks for any help! From T.A.Meyer at massey.ac.nz Mon Feb 10 17:10:40 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Sun Feb 9 23:12:39 2003 Subject: [Spambayes] bug? Outlook2000 Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1318D49B@its-xchg4.massey.ac.nz> > It would be nice to have a small description on how > to uninstall the Outlook2000 plugin. Run addin.py with the parameter "--unregister". This should probably be added to the readme. > I was thinking that maybe I need to > uninstall it and install it again and that might fix it. I > also installed > it on my Windows 2000 machine and it seems to work fine. If you open up the trace collector, does it list any errors when you start up? If it does, could you post them? =Tony Meyer From T.A.Meyer at massey.ac.nz Mon Feb 10 17:15:18 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Sun Feb 9 23:16:48 2003 Subject: [Spambayes] Multiple ini files Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1318D49C@its-xchg4.massey.ac.nz> Does anyone use multiple ini files? (I only use the Outlook plugin, so I don't know how the other systems work). I ask because of the bug (SF#683744) that occurs when an ini path has a space. The solution is nice and simple if no-one uses multiple ini files... :) =Tony Meyer From anthony at interlink.com.au Mon Feb 10 15:50:09 2003 From: anthony at interlink.com.au (Anthony Baxter) Date: Sun Feb 9 23:52:02 2003 Subject: [Spambayes] Multiple ini files In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F1318D49C@its-xchg4.massey.ac.nz> Message-ID: <200302100450.h1A4oAk08185@bonanza.off.ekorp.com> >>> "Meyer, Tony" wrote > Does anyone use multiple ini files? (I only use the Outlook plugin, so I > don't know how the other systems work). All the time when doing testing. I have one ini file that specifies the spam/ham locations, and a second that specifies test parameters. So when I want to run two tests against, say, my 'info' spam/ham corpus, I use BAYESCUSTOMIZE="info.ini test1.ini" python timcv.py .... BAYESCUSTOMIZE="info.ini test2.ini" python timcv.py .... I then want to run with a different corpus, so I use BAYESCUSTOMIZE="personal.ini test1.ini" python timcv.py .... BAYESCUSTOMIZE="personal.ini test2.ini" python timcv.py .... > I ask because of the bug (SF#683744) that occurs when an ini path has a > space. The solution is nice and simple if no-one uses multiple ini > files... :) I guess we could do something like check if the (split up) ini file(s) exist, if not, see if the file (with spaces) exists... From T.A.Meyer at massey.ac.nz Mon Feb 10 17:57:03 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Sun Feb 9 23:59:00 2003 Subject: [Spambayes] Multiple ini files Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1318D4A0@its-xchg4.massey.ac.nz> > All the time when doing testing. [...] Thought that might be the case. > > I ask because of the bug (SF#683744) that occurs when an > > ini path has a space. > I guess we could do something like check if the (split up) > ini file(s) exist, > if not, see if the file (with spaces) exists... Or maybe move the responsibility to whomever sets the envar and make them tokenise the spaces somehow, and then use a version of split that recognises the tokens? For example: "C:\Documents and Settings" becomes in the envar "C:\Documents/ and/ Settings" and clever_split doesn't split if the space is preceded by '/'. A *nix path wouldn't have / and then a space, would it? =Tony Meyer From jwilliam at xmission.com Sun Feb 9 22:45:45 2003 From: jwilliam at xmission.com (Jerry Williams) Date: Mon Feb 10 00:47:09 2003 Subject: [Spambayes] bug? Outlook2000 In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F1318D49B@its-xchg4.massey.ac.nz> Message-ID: Well I tried the unregister and reran addin.py and still no icon. And no errors: Outlook Spam Addin module loading Registered: SpamBayes.OutlookAddin Outlook Spam Addin module loading SpamAddin - Connecting to Outlook Loaded bayes database from 'C:\Python22\spam\spambayes-1.0a2\Outlook2000\default_bayes_database.pck' Loaded message database from 'C:\Python22\spam\spambayes-1.0a2\Outlook2000\default_message_database.pck' Bayes database initialized with 24 spam and 162 good messages AntiSpam: Watching for new messages in folder Inbox AntiSpam: Watching for new messages in folder Spam Processing 0 missed spam in folder 'Inbox' took 10.5516ms -----Original Message----- From: Meyer, Tony [mailto:T.A.Meyer@massey.ac.nz] Sent: Sunday, February 09, 2003 9:11 PM To: Jerry Williams; spambayes@python.org Subject: RE: [Spambayes] bug? Outlook2000 > It would be nice to have a small description on how > to uninstall the Outlook2000 plugin. Run addin.py with the parameter "--unregister". This should probably be added to the readme. > I was thinking that maybe I need to > uninstall it and install it again and that might fix it. I > also installed > it on my Windows 2000 machine and it seems to work fine. If you open up the trace collector, does it list any errors when you start up? If it does, could you post them? =Tony Meyer From mhammond at skippinet.com.au Mon Feb 10 21:53:29 2003 From: mhammond at skippinet.com.au (Mark Hammond) Date: Mon Feb 10 05:53:58 2003 Subject: [Spambayes] bug? Outlook2000 In-Reply-To: Message-ID: <005101c2d0f2$a5ffc120$530f8490@eden> Maybe it is a problem with the bitmaps. I think they are currently 24-bit bitmaps. Try opening them with paintbrush, and saving with fewer colours. Mark. > Well I tried the unregister and reran addin.py and still no icon. > And no errors: > Outlook Spam Addin module loading > Registered: SpamBayes.OutlookAddin > Outlook Spam Addin module loading > SpamAddin - Connecting to Outlook > Loaded bayes database from > 'C:\Python22\spam\spambayes-1.0a2\Outlook2000\default_bayes_da > tabase.pck' > Loaded message database from > 'C:\Python22\spam\spambayes-1.0a2\Outlook2000\default_message_ > database.pck' > Bayes database initialized with 24 spam and 162 good messages > AntiSpam: Watching for new messages in folder Inbox > AntiSpam: Watching for new messages in folder Spam > Processing 0 missed spam in folder 'Inbox' took 10.5516ms > > > -----Original Message----- > From: Meyer, Tony [mailto:T.A.Meyer@massey.ac.nz] > Sent: Sunday, February 09, 2003 9:11 PM > To: Jerry Williams; spambayes@python.org > Subject: RE: [Spambayes] bug? Outlook2000 > > > > It would be nice to have a small description on how > > to uninstall the Outlook2000 plugin. > Run addin.py with the parameter "--unregister". This should > probably be > added to the readme. > > > I was thinking that maybe I need to > > uninstall it and install it again and that might fix it. I > > also installed > > it on my Windows 2000 machine and it seems to work fine. > If you open up the trace collector, does it list any errors > when you start > up? If it does, could you post them? > > =Tony Meyer > > > _______________________________________________ > Spambayes mailing list > Spambayes@python.org > http://mail.python.org/mailman/listinfo/spambayes From jwilliam at xmission.com Mon Feb 10 10:29:15 2003 From: jwilliam at xmission.com (Jerry Williams) Date: Mon Feb 10 12:43:40 2003 Subject: [Spambayes] bug? Outlook2000 In-Reply-To: <005101c2d0f2$a5ffc120$530f8490@eden> Message-ID: I changed my desktop colors from 16bit to 32bit and I get the right half of the icon. I have attached a small screen shot of the icon. -----Original Message----- From: Mark Hammond [mailto:mhammond@skippinet.com.au] Sent: Monday, February 10, 2003 3:53 AM To: 'Jerry Williams'; 'Meyer, Tony'; spambayes@python.org Subject: RE: [Spambayes] bug? Outlook2000 Maybe it is a problem with the bitmaps. I think they are currently 24-bit bitmaps. Try opening them with paintbrush, and saving with fewer colours. Mark. > Well I tried the unregister and reran addin.py and still no icon. > And no errors: > Outlook Spam Addin module loading > Registered: SpamBayes.OutlookAddin > Outlook Spam Addin module loading > SpamAddin - Connecting to Outlook > Loaded bayes database from > 'C:\Python22\spam\spambayes-1.0a2\Outlook2000\default_bayes_da > tabase.pck' > Loaded message database from > 'C:\Python22\spam\spambayes-1.0a2\Outlook2000\default_message_ > database.pck' > Bayes database initialized with 24 spam and 162 good messages > AntiSpam: Watching for new messages in folder Inbox > AntiSpam: Watching for new messages in folder Spam > Processing 0 missed spam in folder 'Inbox' took 10.5516ms > > > -----Original Message----- > From: Meyer, Tony [mailto:T.A.Meyer@massey.ac.nz] > Sent: Sunday, February 09, 2003 9:11 PM > To: Jerry Williams; spambayes@python.org > Subject: RE: [Spambayes] bug? Outlook2000 > > > > It would be nice to have a small description on how > > to uninstall the Outlook2000 plugin. > Run addin.py with the parameter "--unregister". This should > probably be > added to the readme. > > > I was thinking that maybe I need to > > uninstall it and install it again and that might fix it. I > > also installed > > it on my Windows 2000 machine and it seems to work fine. > If you open up the trace collector, does it list any errors > when you start > up? If it does, could you post them? > > =Tony Meyer > > > _______________________________________________ > Spambayes mailing list > Spambayes@python.org > http://mail.python.org/mailman/listinfo/spambayes -------------- next part -------------- A non-text attachment was scrubbed... Name: button.bmp Type: image/bmp Size: 31734 bytes Desc: not available Url : http://mail.python.org/pipermail/spambayes/attachments/20030210/37c6d29c/button-0001.bin From gmino at pcsltd.com Mon Feb 10 13:38:32 2003 From: gmino at pcsltd.com (Gabriel Mino) Date: Mon Feb 10 13:32:48 2003 Subject: [Spambayes] Outlook 2002 support? Message-ID: <3261E796E368954CB22963F2B63E81051EE89B@xmail.pcsltd.com> Anyone know if this is a possibility presently or a not too distant pipe dream? Thanks in advance!!! Gabriel From popiel at wolfskeep.com Mon Feb 10 11:32:37 2003 From: popiel at wolfskeep.com (T. Alexander Popiel) Date: Mon Feb 10 14:32:42 2003 Subject: [Spambayes] Multiple ini files In-Reply-To: Message from "Meyer, Tony" <1ED4ECF91CDED24C8D012BCF2B034F1318D4A0@its-xchg4.massey.ac.nz> References: <1ED4ECF91CDED24C8D012BCF2B034F1318D4A0@its-xchg4.massey.ac.nz> Message-ID: <20030210193237.395C32DE8B@cashew.wolfskeep.com> In message: <1ED4ECF91CDED24C8D012BCF2B034F1318D4A0@its-xchg4.massey.ac.nz> "Meyer, Tony" writes: > >A *nix path wouldn't have / and then a space, would it? Typically not, but there's nothing to actually prevent it. The legal characters for a file name include ' ', along with many characters that otherwise foul up common shell scripts ('$', CR, LF, etc.). It's only convention that keeps such things from being used... most of the time. - Alex From tim at fourstonesExpressions.com Mon Feb 10 13:35:50 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Mon Feb 10 14:35:59 2003 Subject: [Spambayes] Multiple ini files In-Reply-To: <20030210193237.395C32DE8B@cashew.wolfskeep.com> Message-ID: I checked in a fix to Options.py that should correct this problem. spaces in file names are now tolerated, and the filenames are separated by spaces. How did I do that, you ask? The magic of regex... - TimS 2/10/2003 1:32:37 PM, "T. Alexander Popiel" wrote: >In message: <1ED4ECF91CDED24C8D012BCF2B034F1318D4A0@its-xchg4.massey.ac.nz> > "Meyer, Tony" writes: >> >>A *nix path wouldn't have / and then a space, would it? > >Typically not, but there's nothing to actually prevent it. >The legal characters for a file name include ' ', along with >many characters that otherwise foul up common shell scripts >('$', CR, LF, etc.). It's only convention that keeps such >things from being used... most of the time. > >- Alex > >_______________________________________________ >Spambayes mailing list >Spambayes@python.org >http://mail.python.org/mailman/listinfo/spambayes > > c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org From gmino at pcsltd.com Mon Feb 10 16:38:36 2003 From: gmino at pcsltd.com (Gabriel Mino) Date: Mon Feb 10 16:31:31 2003 Subject: [Spambayes] Outlook 2002 support? Message-ID: <3261E796E368954CB22963F2B63E81051EE89E@xmail.pcsltd.com> I have installed Python-2.2.2.exe, email-2.4.3 and win32all-152.exe and ran setup.py install for spambayes-2003-01-17 I am receiving this upon execution of addin.py : Outlook Spam Addin module loading This Addin requires that Outlook 2000 be installed on this machine. This appears to not be installed due to the following error: COM Error 0x8002801d (Library not registered.) Sorry I can't be more help, but I can't continue while I have this error. Any thoughts? I'm not even sure if Python, the extentions or anything for that matter has been installed correctly. Any ideas on what a newbie to Python like me may be missing? I am running Office 2002/XP on Win2K Prof. TIA Gabriel -----Original Message----- From: Mark Hammond [mailto:mhammond@skippinet.com.au ] Sent: Monday, February 10, 2003 4:10 PM To: Gabriel Mino; spambayes@python.org Subject: RE: [Spambayes] Outlook 2002 support? > Anyone know if this is a possibility presently or a not too distant > pipe dream? Thanks in advance!!! I expect it to work now with Outlook 2002. And as soon as I find that damn missing CD, I will install it on my XP box to test. Have you tried it? Mark. From mhammond at skippinet.com.au Tue Feb 11 08:10:08 2003 From: mhammond at skippinet.com.au (Mark Hammond) Date: Mon Feb 10 16:35:42 2003 Subject: [Spambayes] Outlook 2002 support? In-Reply-To: <3261E796E368954CB22963F2B63E81051EE89B@xmail.pcsltd.com> Message-ID: <000001c2d148$cb47b540$530f8490@eden> > Anyone know if this is a possibility presently or a not too > distant pipe > dream? Thanks in advance!!! I expect it to work now with Outlook 2002. And as soon as I find that damn missing CD, I will install it on my XP box to test. Have you tried it? Mark. From gmino at pcsltd.com Mon Feb 10 16:59:49 2003 From: gmino at pcsltd.com (Gabriel Mino) Date: Mon Feb 10 16:52:38 2003 Subject: [Spambayes] Re: trouble registering outlook addin? Message-ID: <3261E796E368954CB22963F2B63E81051EE89F@xmail.pcsltd.com> Well some of you guys asked for an average "Joe".....here I am!!! Traceback (most recent call last): File "D:\Python22\lib\site-packages\Pythonwin\pywin\framework\scriptutils.py", line 310, in RunScript exec codeObject in __main__.__dict__ File "D:\Python22\Lib\site-packages\win32com\demos\outlookAddin.py", line 41, in ? universal.RegisterInterfaces('{AC0714F2-3D04-11D1-AE7D-00A0C90F26F4}', 0, 1, 0, ["_IDTExtensibility2"]) File "D:\Python22\lib\site-packages\win32com\universal.py", line 21, in RegisterInterfaces tlb = pythoncom.LoadRegTypeLib(typelibGUID, major, minor, lcid) com_error: (-2147319779, 'Library not registered.', None, None) Gabriel From Paul.Moore at atosorigin.com Tue Feb 11 09:14:20 2003 From: Paul.Moore at atosorigin.com (Moore, Paul) Date: Tue Feb 11 04:17:58 2003 Subject: [Spambayes] bug? Outlook2000 Message-ID: <16E1010E4581B049ABC51D4975CEDB880113D8FB@UKDCX001.uk.int.atosorigin.com> From: Jerry Williams [mailto:jwilliam@xmission.com] > I changed my desktop colors from 16bit to 32bit and I get > the right half of the icon. I have attached a small screen > shot of the icon. >From the picture, it looks like you have "Large Fonts" set. Could that be the cause? What happens if you switch to normal fonts? Paul. From gary at inauspicious.org Tue Feb 11 12:03:05 2003 From: gary at inauspicious.org (Gary Benson) Date: Tue Feb 11 07:03:20 2003 Subject: [Spambayes] hammiefilter just started failing Message-ID: <20030211120304.GC28714@inauspicious.org> I've been using Spambayes for about a fortnight with great success, but last night it started to fail. Messages are being delivered (exim has a safety net against pipes failing) but they are also being bounced with the following traceback: | Traceback (most recent call last): | File "/usr/bin/hammiefilter", line 134, in ? | main() | File "/usr/bin/hammiefilter", line 131, in main | action() | File "/usr/bin/hammiefilter", line 87, in filter | print h.filter(msg) | File "/usr/lib/python2.2/site-packages/spambayes/hammie.py", line 98, in filter | prob, clues = self._scoremsg(msg, True) | File "/usr/lib/python2.2/site-packages/spambayes/hammie.py", line 38, in _scoremsg | return self.bayes.spamprob(tokenize(msg), evidence) | File "/usr/lib/python2.2/site-packages/spambayes/classifier.py", line 217, in chi2_spamprob | clues = self._getclues(wordstream) | File "/usr/lib/python2.2/site-packages/spambayes/classifier.py", line 441, in _getclues | prob = self.probability(record) | File "/usr/lib/python2.2/site-packages/spambayes/classifier.py", line 304, in probability | assert spamcount <= nspam | AssertionError This is happening with all messages: a quick check shows that spamcount is slightly higher than nspam (like 104 and 103) so I just replaced the assertion with 'if spamcount > nspam: spamcount = nspam' as a temporary workaround. Has anyone heard of this happening before? I'd like to know if this is a known problem before I start trying to debug it... I have a copy of my .hammiedb (taken before I did the above tweak) if you want it. Cheers, Gary [ gary@inauspicious.org ][ GnuPG 85A8F78B ][ http://inauspicious.org/ ] From tim at fourstonesExpressions.com Tue Feb 11 07:43:12 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Tue Feb 11 08:43:19 2003 Subject: [Spambayes] hammiefilter just started failing In-Reply-To: <20030211120304.GC28714@inauspicious.org> Message-ID: The 'correct' solution to this problem is a retrain. But that's just not always gonna be practical. I've had problems with these assertions, but it seems that they should always be true... The statistics dudes are gonna have to weigh in on this one, because these parameters are used in the combining scheme, and fooling with them has consequences. Tim? Gary? Rob? - TimS 2/11/2003 6:03:05 AM, Gary Benson wrote: >I've been using Spambayes for about a fortnight with great success, >but last night it started to fail. Messages are being delivered (exim >has a safety net against pipes failing) but they are also being >bounced with the following traceback: > >| Traceback (most recent call last): >| File "/usr/bin/hammiefilter", line 134, in ? >| main() >| File "/usr/bin/hammiefilter", line 131, in main >| action() >| File "/usr/bin/hammiefilter", line 87, in filter >| print h.filter(msg) >| File "/usr/lib/python2.2/site-packages/spambayes/hammie.py", line 98, in filter >| prob, clues = self._scoremsg(msg, True) >| File "/usr/lib/python2.2/site-packages/spambayes/hammie.py", line 38, in _scoremsg >| return self.bayes.spamprob(tokenize(msg), evidence) >| File "/usr/lib/python2.2/site-packages/spambayes/classifier.py", line 217, in chi2_spamprob >| clues = self._getclues(wordstream) >| File "/usr/lib/python2.2/site-packages/spambayes/classifier.py", line 441, in _getclues >| prob = self.probability(record) >| File "/usr/lib/python2.2/site-packages/spambayes/classifier.py", line 304, in probability >| assert spamcount <= nspam >| AssertionError > >This is happening with all messages: a quick check shows that >spamcount is slightly higher than nspam (like 104 and 103) so I just >replaced the assertion with 'if spamcount > nspam: spamcount = nspam' >as a temporary workaround. > >Has anyone heard of this happening before? I'd like to know if this >is a known problem before I start trying to debug it... > >I have a copy of my .hammiedb (taken before I did the above tweak) if >you want it. > >Cheers, >Gary > >[ gary@inauspicious.org ][ GnuPG 85A8F78B ][ http://inauspicious.org/ ] > >_______________________________________________ >Spambayes mailing list >Spambayes@python.org >http://mail.python.org/mailman/listinfo/spambayes > > c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org From Barnaby.Dalton at radioscape.com Tue Feb 11 09:43:31 2003 From: Barnaby.Dalton at radioscape.com (Dalton, Barnaby) Date: Tue Feb 11 10:03:45 2003 Subject: [Spambayes] Awesome Message-ID: <3190BC9FA8F6D3119508009027E5B33E013E355E@MORSE> I downloaded and installed spambayes yesterday. What a pleasure it was to open my email this morning to discover the only messages in my inbox were ones I actually wanted to read. Thankyou. I'm using the Outlook2000 addin. You might want to add something to the documentation that ActivePython 2.2.2 doesn't have a recent enough version of win32all (build 146). I had never heard of win32all as I've always used what comes with active python. When you go the to win32all web page it says that active python users don't need win32all. However, I couldn't get the addin to work until I downloaded and installed the latest win32all (150) on top of active python 2.2.2. Barney ********************************************************************** This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error please notify postmaster@radioscape.com. This footnote also confirms that this email message has been scanned for the presence of computer viruses known at the time of sending. www.radioscape.com ********************************************************************** From gmino at pcsltd.com Fri Feb 7 11:03:11 2003 From: gmino at pcsltd.com (Gabriel Mino) Date: Tue Feb 11 10:36:52 2003 Subject: [Spambayes] Awesome In-Reply-To: <3261E796E368954CB22963F2B63E81052027F1@xmail.pcsltd.com> Message-ID: <3261E796E368954CB22963F2B63E8105137655@xmail.pcsltd.com> Hey would you mind helping me? I am a total newbie to python but, not completely lost. I am trying for testing purposes to set things up using the Outlook plugin although long term would probably be looking to use the procmail or proxy configuration. I have installed via executables: Python-2.2.2.exe, win32all-152.exe, bsddb3-4.1.3.win32-py2.2.exe. Also, I have installed(?) via python: email-2.4.3 - setup.py install, spambayes-2003-01-17 - setup.py install, and attempted addin.py after modifying a few lines to use office/outlook 10 parameters instead of office/outlook 9 I am trying this on win2k (yuck I know) + office XP/2002 TIA Gabriel -----Original Message----- From: spambayes-bounces@python.org [mailto:spambayes-bounces@python.org] On Behalf Of Dalton, Barnaby Sent: Tuesday, February 11, 2003 4:44 AM To: 'SpamBayes@python.org' Subject: [Spambayes] Awesome I downloaded and installed spambayes yesterday. What a pleasure it was to open my email this morning to discover the only messages in my inbox were ones I actually wanted to read. Thankyou. I'm using the Outlook2000 addin. You might want to add something to the documentation that ActivePython 2.2.2 doesn't have a recent enough version of win32all (build 146). I had never heard of win32all as I've always used what comes with active python. When you go the to win32all web page it says that active python users don't need win32all. However, I couldn't get the addin to work until I downloaded and installed the latest win32all (150) on top of active python 2.2.2. Barney ********************************************************************** This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error please notify postmaster@radioscape.com. This footnote also confirms that this email message has been scanned for the presence of computer viruses known at the time of sending. www.radioscape.com ********************************************************************** _______________________________________________ Spambayes mailing list Spambayes@python.org http://mail.python.org/mailman/listinfo/spambayes From skip at pobox.com Tue Feb 11 09:45:01 2003 From: skip at pobox.com (Skip Montanaro) Date: Tue Feb 11 10:45:22 2003 Subject: [Spambayes] Awesome In-Reply-To: <3261E796E368954CB22963F2B63E8105137655@xmail.pcsltd.com> References: <3261E796E368954CB22963F2B63E81052027F1@xmail.pcsltd.com> <3261E796E368954CB22963F2B63E8105137655@xmail.pcsltd.com> Message-ID: <15945.6781.119562.917081@montanaro.dyndns.org> Gabriel> I am a total newbie to python but, not completely lost. I am Gabriel> trying for testing purposes to set things up using the Outlook Gabriel> plugin although long term would probably be looking to use the Gabriel> procmail or proxy configuration. Gabriel> I have installed via executables: Python-2.2.2.exe, Gabriel> win32all-152.exe, bsddb3-4.1.3.win32-py2.2.exe. Gabriel> Also, I have installed(?) via python: email-2.4.3 - setup.py Gabriel> install, spambayes-2003-01-17 - setup.py install, and attempted Gabriel> addin.py after modifying a few lines to use office/outlook 10 Gabriel> parameters instead of office/outlook 9 Gabriel> I am trying this on win2k (yuck I know) + office XP/2002 Gabriel, You never identified a problem. You just told us what you did. What do you need help with? What's not working for you? -- Skip Montanaro skip@pobox.com http://www.musi-cal.com/ From gmino at pcsltd.com Tue Feb 11 10:56:55 2003 From: gmino at pcsltd.com (Gabriel Mino) Date: Tue Feb 11 10:49:42 2003 Subject: [Spambayes] Awesome Message-ID: <3261E796E368954CB22963F2B63E81051EE8A0@xmail.pcsltd.com> Geesh...sorry gang.....been a lil while since I've been on a mailing list.... To follow my last mail: I am receiving: Outlook Spam Addin module loading This Addin requires that Outlook 2000 be installed on this machine. This appears to not be installed due to the following error: COM Error 0x8002801d (Library not registered.) Sorry I can't be more help, but I can't continue while I have this error. While trying to install the outlook addin TIA Once again...my apologies Gabriel From Paul.Moore at atosorigin.com Tue Feb 11 16:02:13 2003 From: Paul.Moore at atosorigin.com (Moore, Paul) Date: Tue Feb 11 11:02:48 2003 Subject: [Spambayes] Awesome Message-ID: <16E1010E4581B049ABC51D4975CEDB880113D8FD@UKDCX001.uk.int.atosorigin.com> From: Gabriel Mino [mailto:gmino@pcsltd.com] > Outlook Spam Addin module loading > > This Addin requires that Outlook 2000 be installed on this machine. I think you said you were using Outlook 2002/XP. If so, Mark very recently (yesterday? today?) checked something into CVS about this. If you're working from the 1.0a1 prerelease off the website, I guess you'll have to wait for the new release (due soon, I believe). If you're comfortable with CVS, you can get the current CVS version and that should work. Paul. PS I use Outlook 2000, so I don't have this problem. I may be wrong about any of the above :-) From piersh at friskit.com Tue Feb 11 09:01:02 2003 From: piersh at friskit.com (Piers Haken) Date: Tue Feb 11 11:43:19 2003 Subject: [Spambayes] Awesome Message-ID: <9891913C5BFE87429D71E37F08210CB92C74E5@zeus.sfhq.friskit.com> Have you tried reinstalling outlook? It might be worth a try... > -----Original Message----- > From: Gabriel Mino [mailto:gmino@pcsltd.com] > Sent: Tuesday, February 11, 2003 7:57 AM > To: 'SpamBayes@python.org' > Subject: RE: [Spambayes] Awesome > > > Geesh...sorry gang.....been a lil while since I've been on a > mailing list.... > > > > To follow my last mail: > > > > I am receiving: > > > > Outlook Spam Addin module loading > > This Addin requires that Outlook 2000 be installed on this machine. > > > > This appears to not be installed due to the following error: > > COM Error 0x8002801d (Library not registered.) > > Sorry I can't be more help, but I can't continue while I have > this error. > > > > While trying to install the outlook addin > > > > TIA > > > > Once again...my apologies > > > > Gabriel > > > > _______________________________________________ > Spambayes mailing list > Spambayes@python.org http://mail.python.org/mailman/listinfo/spambayes > From tim.one at comcast.net Tue Feb 11 11:59:36 2003 From: tim.one at comcast.net (Tim Peters) Date: Tue Feb 11 12:00:42 2003 Subject: [Spambayes] Awesome In-Reply-To: <9891913C5BFE87429D71E37F08210CB92C74E5@zeus.sfhq.friskit.com> Message-ID: [Piers Haken] > Have you tried reinstalling outlook? It might be worth a try... [Gabriel Mino] > ... > am receiving: > > Outlook Spam Addin module loading > > This Addin requires that Outlook 2000 be installed on this machine. If Gabriel is using Outlook 2002, reinstalling it *probably* won't downgrade it to Outlook 2000 . Nobody working on the Outlook addin had Outlook 2002, so the 2002 flavor just isn't supported by the addin so far. From jon at bergenstreetsoftware.com Tue Feb 11 12:06:56 2003 From: jon at bergenstreetsoftware.com (Jonathan Baumgartner) Date: Tue Feb 11 12:07:34 2003 Subject: [Spambayes] pop3proxy: what are (none) messages? Message-ID: [I sent this earlier, but got no response, so I'm trying again.] So I've got pop3proxy working with Eudora beautifully. Thanks again everyone for helping me out with that. The problem I'm seeing now relates to the review process. When I go to train messages, sometimes a message will show up with "(none)" for a subject as well as for a recipient. Is this a bug, or a limitation or something else? So far I've been assuming those are spam and training them as spam. Hopefully that's correct. thanks, jon From robibaro at robibaro.com Tue Feb 11 14:46:50 2003 From: robibaro at robibaro.com (Eric Robibaro) Date: Tue Feb 11 14:47:38 2003 Subject: [Spambayes] Alpha2 Pre-release In-Reply-To: References: Message-ID: <41598437.1044974810@[10.0.18.7]> I'm having a few problems installing that version under python2.2 and 2.3 on a mostly stock debian unstable i386 platform here is the command and the error message python2.3 setup.py install error in spambayes setup command: invalid distribution option 'classifiers' I presume I should back up a version and try alpha 1? or do I need to try cvs ? or am I having a problem with the python install ? I've got practically nothing else running on python at the moment, I upgraded specifically because spambayes and mimelib et distutils were incorporated in 2.3, was that a bad choice? or do I just need to try a source python install Any insight would be appreciated --On January 30, 2003 20:33 +0000 Richie Hindle wrote: > > I've built an alpha2 source release of Spambayes. Before we put it up on > the main web site, I'd feel a lot better if someone could smoke-test it > for me - I may have made some horrible mistake that I'm too close to > see... > > I've put it here: > > http://entrian.com/spambayes/spambayes-1.0a2-pre.zip > http://entrian.com/spambayes/spambayes-1.0a2-pre.tar.gz > > For POP3 proxy users, this release should be GUI out of the box - install > it, run pop3proxy.py, point your browser at the URL, go to the Config page > and enter your POP3 server details, change your email client to point at > the proxy, and you're away - messages are classfied and you can train > through the web. > > For hammie users there's Neale's new muttrc and spambayes.el, and Skip's > proxytee lets hammie users train through the web interface. Tim Stone's > import/export script should make upgrading easy, for now and in the > future. Assorted improvements to the tokeniser and classifier make > spambayes even more accurate. > > What else has changed? We should do a proper release announcement - I > don't keep up with the Outlook plug-in, so what's new there? Who've I > offended by forgetting about their fantastic new feature? 8-) > > One question for those who know about these things: I originally built the > release on Windows, but then realised that all the source files in both > the zip and tar.gz archives had Windows line-endings. People installing > and editing on unix would see '^M's all over the place (possibly, > depending on their editor). Is there a distutils option I've missed to > prevent this? > > Anyway, I rebuilt the archives on unix (thanks Neale!). > > -- > Richie Hindle > richie@entrian.com > > > _______________________________________________ > Spambayes mailing list > Spambayes@python.org > http://mail.python.org/mailman/listinfo/spambayes From piersh at friskit.com Tue Feb 11 13:07:58 2003 From: piersh at friskit.com (Piers Haken) Date: Tue Feb 11 15:50:19 2003 Subject: [Spambayes] Awesome Message-ID: <9891913C5BFE87429D71E37F08210CB92C74E6@zeus.sfhq.friskit.com> Oh sorry, I missed that part. However, Outlook XP works just fine. I'm typing this into it right now. Piers. > -----Original Message----- > From: Tim Peters [mailto:tim.one@comcast.net] > Sent: Tuesday, February 11, 2003 9:00 AM > To: Gabriel Mino; SpamBayes@python.org > Subject: RE: [Spambayes] Awesome > > > [Piers Haken] > > Have you tried reinstalling outlook? It might be worth a try... > > [Gabriel Mino] > > ... > > am receiving: > > > > Outlook Spam Addin module loading > > > > This Addin requires that Outlook 2000 be installed on this machine. > > If Gabriel is using Outlook 2002, reinstalling it *probably* > won't downgrade it to Outlook 2000 . Nobody working on > the Outlook addin had Outlook 2002, so the 2002 flavor just > isn't supported by the addin so far. > > > _______________________________________________ > Spambayes mailing list > Spambayes@python.org http://mail.python.org/mailman/listinfo/spambayes > From mhammond at skippinet.com.au Wed Feb 12 07:57:05 2003 From: mhammond at skippinet.com.au (Mark Hammond) Date: Tue Feb 11 15:58:06 2003 Subject: [Spambayes] Awesome In-Reply-To: <9891913C5BFE87429D71E37F08210CB92C74E6@zeus.sfhq.friskit.com> Message-ID: <007b01c2d210$22e79ff0$530f8490@eden> > Oh sorry, I missed that part. > > However, Outlook XP works just fine. I'm typing this into it > right now. Yes, I expected that Outlook XP would work. The type libraries have the same version number (with only the minor version changing, which, from COMs POV, means it can be loaded OK) Gabriel's machine has some strange problem - the 2 type libraries *do* exist on the machine (we verified that) but for some reason, that machine will not load them. I am very happy to hear that we have at least one reported success on Outlook XP :) Mark. -------------- next part -------------- A non-text attachment was scrubbed... Name: winmail.dat Type: application/ms-tnef Size: 1788 bytes Desc: not available Url : http://mail.python.org/pipermail/spambayes/attachments/20030212/6f1cc5a2/winmail.bin From marklists at mceahern.com Tue Feb 11 18:13:44 2003 From: marklists at mceahern.com (Mark McEahern) Date: Tue Feb 11 19:14:02 2003 Subject: [Spambayes] FW: Python 11 proposals deadline is February 15th! Message-ID: My apologies if this is redundant, it just seems like it'd be killer to have someone submit a proposal on spambayes... Cheers, // m -----Original Message----- From: python-list-admin@python.org [mailto:python-list-admin@python.org]On Behalf Of Kevin Altis Sent: Tuesday, February 11, 2003 5:22 PM To: python-list@python.org Subject: Re: Python 11 proposals deadline is February 15th! Nathan Torkington at O'Reilly asked for proposals covering the following: "The two areas that I'd specifically like to get more proposals are core knowledge (e.g., data structures, standard library, OOP, processing internet email, testing, threading, GUI programming) and recent advances in the Python world (new modules, new core stuff, new community stuff)." ka --- Kevin Altis - Python 11 co-chair altis@semi-retired.com "Guido van Rossum" wrote in message news:mailman.1044985479.8311.python-list@python.org... > The Python 11 Conference is being held July 7-11 in Portland, Oregon > as part of OSCON 2003. > > http://conferences.oreillynet.com/os2003/ > > The deadline for proposals is February 15th! > > You only need to have your proposal in this week, you don't need to > worry about trying to put together the complete presentation or > tutorial materials at this time. > > Proposal submissions page: > http://conferences.oreillynet.com/cs/os2003/create/e_sess > > Few proposals have been submitted so far, we need many more to have a > successful Python 11 conference. If you have submitted a proposal for > one of the other Python conferences this year such as PyCon, I > encourage you to go ahead and submit the proposal to Python 11 as > well. If you are presenting at the Python UK Conference or EuroPython, > but are unable to attend Python 11, you should consider having another > team member do the presentation. > > The theme of OSCON 2003 is "Embracing and Extending Proprietary > Software". Papers and presentations on how to successfully transition > away from proprietary software would also be good, but it is not > necessary for your proposal to cover the theme, proposals just need to > be related to Python. > > > COMPENSATION: > > Free registration for speakers (except lightning talks). Tutorial > speakers also get a honorarium, and some of their hotel and travel > costs are covered. > > > O'REILLY ANNOUNCEMENT: > > 2003 O'Reilly Open Source Convention Call For Participation > Embracing and Extending Proprietary Software > http://conferences.oreilly.com/oscon/ > > > O'Reilly & Associates invites programmers, developers, strategists, > and technical staff to submit proposals to lead tutorial and conference > sessions at the 2003 Open Source Software Convention, slated for > July 7-11 in Portland, OR. > > Proposals are due February 15, 2003. For more information please > visit our OSCON website http://conferences.oreilly.com/oscon/ > > The theme this year is "Embracing and Extending Proprietary Software." > Few companies use only one vendor's software on desktops, back office, > and servers. Variety in operating systems and applications is becoming > the norm, for sound financial and technical reasons. With variety comes > the need for open unencumbered standards for data exchange and service > interoperability. You can address the theme from any angle you like--for > example, you might talk about migrating away from commercial software > such as Microsoft Windows, or instead place your emphasis on coexistence. > > Convention Conferences > Perl Conference 7 > The Python 11 Conference > PHP Conference 3 > > Convention Tracks > Apache > XML > Applications > MySQL and PostgreSQL > Ruby > > --Guido van Rossum (home page: http://www.python.org/~guido/) > -- http://mail.python.org/mailman/listinfo/python-list - - From gmino at pcsltd.com Fri Feb 7 19:50:11 2003 From: gmino at pcsltd.com (Gabriel Mino) Date: Tue Feb 11 19:23:29 2003 Subject: [Spambayes] Awesome In-Reply-To: <3261E796E368954CB22963F2B63E81051C7675@xmail.pcsltd.com> Message-ID: <3261E796E368954CB22963F2B63E8105137657@xmail.pcsltd.com> So I guess this means I'm special huh? Great!!! Anyone (especially users using the Outlook plugin!!) have any clues that might help this Python newbie? TIA gabriel -----Original Message----- From: Mark Hammond [mailto:mhammond@skippinet.com.au] Sent: Tuesday, February 11, 2003 3:57 PM To: 'Piers Haken'; Gabriel Mino; SpamBayes@python.org Subject: RE: [Spambayes] Awesome > Oh sorry, I missed that part. > > However, Outlook XP works just fine. I'm typing this into it > right now. Yes, I expected that Outlook XP would work. The type libraries have the same version number (with only the minor version changing, which, from COMs POV, means it can be loaded OK) Gabriel's machine has some strange problem - the 2 type libraries *do* exist on the machine (we verified that) but for some reason, that machine will not load them. I am very happy to hear that we have at least one reported success on Outlook XP :) Mark. From piersh at friskit.com Tue Feb 11 16:56:30 2003 From: piersh at friskit.com (Piers Haken) Date: Tue Feb 11 19:38:49 2003 Subject: [Spambayes] Awesome Message-ID: <9891913C5BFE87429D71E37F08210CB92C74EC@zeus.sfhq.friskit.com> weird. which libraries were they? if they're type libraries, try opening them with oleview.exe: http://www.microsoft.com/com/resources/oleview.asp or if they're DLLs, try checking their dependancies with depends.exe: http://www.dependencywalker.com/ that might give some clues as to why they're not loading... it sounds to me that there's something definitely screwed with your office install. piers. -----Original Message----- From: Mark Hammond [mailto:mhammond@skippinet.com.au] Sent: Tuesday, February 11, 2003 12:57 PM To: Piers Haken; 'Gabriel Mino'; SpamBayes@python.org Subject: RE: [Spambayes] Awesome > Oh sorry, I missed that part. > > However, Outlook XP works just fine. I'm typing this into it > right now. Yes, I expected that Outlook XP would work. The type libraries have the same version number (with only the minor version changing, which, from COMs POV, means it can be loaded OK) Gabriel's machine has some strange problem - the 2 type libraries *do* exist on the machine (we verified that) but for some reason, that machine will not load them. I am very happy to hear that we have at least one reported success on Outlook XP :) Mark. From jwilliam at xmission.com Tue Feb 11 19:08:50 2003 From: jwilliam at xmission.com (Jerry Williams) Date: Tue Feb 11 21:08:54 2003 Subject: [Spambayes] bug? Outlook2000 In-Reply-To: <16E1010E4581B049ABC51D4975CEDB880113D8FB@UKDCX001.uk.int.atosorigin.com> Message-ID: I did have large fonts, so I switched to small fonts and rebooted and it doesn't change. -----Original Message----- From: Moore, Paul [mailto:Paul.Moore@atosorigin.com] Sent: Tuesday, February 11, 2003 2:14 AM To: Jerry Williams; Mark Hammond; Meyer, Tony; spambayes@python.org Subject: RE: [Spambayes] bug? Outlook2000 From: Jerry Williams [mailto:jwilliam@xmission.com] > I changed my desktop colors from 16bit to 32bit and I get > the right half of the icon. I have attached a small screen > shot of the icon. >From the picture, it looks like you have "Large Fonts" set. Could that be the cause? What happens if you switch to normal fonts? Paul. From brianf at suckerpunch.com Tue Feb 11 13:18:21 2003 From: brianf at suckerpunch.com (Brian Fleming) Date: Tue Feb 11 22:22:24 2003 Subject: [Spambayes] Outlook 2002 + XP works for me Message-ID: <001401c2d213$1b1dbae0$1b10fea9@suckerpunch.com> I just installed it today and already it is working like a charm. THANKS a million to all of you who contributed to the effort -- it's remarkable! Brian From ejoy at peoplemail.com.cn Wed Feb 12 10:55:03 2003 From: ejoy at peoplemail.com.cn (Zhang Le) Date: Tue Feb 11 22:22:25 2003 Subject: [Spambayes] What does Delayed cost mean? Message-ID: <20030212025503.GA792@localhost.localdomain> Hi all you spambayes folks, I've downloaded spambayes and played with it for a couple of days. It's fun to learn a good language through such a researh project. I can test some new ideas within current spambayes framework, really fun! When running one test I got the following msg: ... -> best cost for all runs: $9.00 -> per-fp cost $10.00; per-fn cost $1.00; per-unsure cost $0.20 -> achieved at 26 cutoff pairs -> smallest ham & spam cutoffs 0.22 & 0.975 -> fp 0; fn 2; unsure ham 9; unsure spam 26 -> fp rate 0%; fn rate 0.417%; unsure rate 1.87% -> largest ham & spam cutoffs 0.28 & 0.98 -> fp 0; fn 2; unsure ham 9; unsure spam 26 -> fp rate 0%; fn rate 0.417%; unsure rate 1.87% -> all runs false positives: 8 -> all runs false negatives: 6 -> all runs unsure: 0 -> all runs false positive %: 0.575539568345 -> all runs false negative %: 1.25 -> all runs unsure %: 0.0 -> all runs cost: $86.00 Total messages: 1870; 1390 (74.3%) ham + 480 (25.7%) spam Ham: 1382 (99.42%) ok, 0 (0.00%) unsure, 8 (0.58%) fp Spam: 474 (98.75%) ok, 0 (0.00%) unsure, 6 (1.25%) fn Score False: 0.75% Unsure 0.00% Standard Cost: $86.0000 Flex Cost: $86.0000 Flex**2 Cost: $86.0000 Delayed-Total messages: 1870; 1390 (74.3%) ham + 480 (25.7%) spam Delayed-Ham: 1381 (99.35%) ok, 9 (0.65%) unsure, 0 (0.00%) fp Delayed-Spam: 452 (94.17%) ok, 26 (5.42%) unsure, 2 (0.42%) fn Delayed-Score False: 0.11% Unsure 1.87% Delayed-Standard Cost: $9.0000 Delayed-Flex Cost: $75.5407 Delayed-Flex**2 Cost: $61.8529 My question is :what does Delayed cost mean? Where is *delay* come from? Since I just use one training set and one test set. Thanks in advance. -- Sincerely yours, Zhang Le From rob at hooft.net Wed Feb 12 08:07:07 2003 From: rob at hooft.net (Rob W.W. Hooft) Date: Wed Feb 12 02:10:18 2003 Subject: [Spambayes] What does Delayed cost mean? References: <20030212025503.GA792@localhost.localdomain> Message-ID: <3E49F29B.8010008@hooft.net> Zhang Le wrote: > -> best cost for all runs: $9.00 > -> per-fp cost $10.00; per-fn cost $1.00; per-unsure cost $0.20 > -> achieved at 26 cutoff pairs > My question is :what does Delayed cost mean? > Where is *delay* come from? Since I just use one training set and one test > set. The "Delayed" cost is the cost recalculated after the spamcutoff and the hamcutoff have been moved to those corresponding to the "best cost" mentioned in the output. i.e. it is the cost at ideal cutoff parameters. Regards, Rob Hooft -- Rob W.W. Hooft || rob@hooft.net || http://www.hooft.net/people/rob/ From rob at hooft.net Wed Feb 12 09:23:01 2003 From: rob at hooft.net (Rob W.W. Hooft) Date: Wed Feb 12 03:25:51 2003 Subject: [Spambayes] It gets funnier all the time.... Message-ID: <3E4A0465.6060403@hooft.net> I just received a spam like this: =================================================================== Return-Path: Delivered-To: hooft@bruker-nonius.com Received: from extheme.theme.com.hk (unknown [202.64.195.148]) by nosr3.delft.axs (Postfix) with ESMTP id 5C5223C8 for ; Wed, 12 Feb 2003 08:44:20 +0100 (CET) Received: from Quon (210.21.10.158 [210.21.10.158]) by extheme.theme.com.hk with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2650.21) id 1LLLDTRV; Wed, 12 Feb 2003 15:51:16 +0800 From: "Frederick George" To: r.w.w.hooft@nonius.nl Subject: =?ISO-8859-3?B?ci53LncuaG9vZnQsMTAwJSBBbGwgTmF0dXJhbCBQZW5pcyBFbmxhcmdlbWVudCBQaWxscw==?= Date: Wed, 12 Feb 2003 07:41:30 GMT Content-Type: text/html; charset=us-ascii Message-Id: <20030212074420.5C5223C8@nosr3.delft.axs> X-Spambayes-Classification: unsure; 0.98 PEhUTUw+DQo8Qk9EWSBCR0NPTE9SPXdoaXRlPjxjZW50ZXI+PGZvbnQgY29sb3I9I0ZGRkZGRj48 IS0tLS0+LTwhLS0tLT4NCjxBIEhSRUY9aHR0cDovL3d3dy5vbmxpbmVkbnMub3JnLzEvPjxmb250 IA0KY29sb3I9I0ZGRkZGRj48IS0tLS0+LTwhLS0tLT48L2ZvbnQ+PC9hPjxCcj4NCjxhIGhyZWY9 Imh0dHA6Ly93d3cub25saW5lZG5zLm9yZy8xLyI+DQo8aW1nIHNyYz0iaHR0cDovL29ubGluZWRu cy5vcmcvMy5naWYiIGJvcmRlcj0wPg0KPC9hPjxmb250IGNvbG9yPSNGRkZGRkY+DQo8IS0tLS0+ PCEtLS0tPjwhLS0tLT48IS0tLS0+DQo8L2JvZHk+PC9IVE1MPg== ======================================================================== The subject is a form of increasing manlyhood, but there is no header defining the encoding of the body.... Is there mailer software out there that automatically interprets this as base64 without explicit instruction to do that? Regards, Rob -- Rob W.W. Hooft || rob@hooft.net || http://www.hooft.net/people/rob/ From msergeant at startechgroup.co.uk Wed Feb 12 10:32:23 2003 From: msergeant at startechgroup.co.uk (Matt Sergeant) Date: Wed Feb 12 05:32:27 2003 Subject: [Spambayes] It gets funnier all the time.... In-Reply-To: <3E4A0465.6060403@hooft.net> Message-ID: <45CD00F7-3E75-11D7-A91C-0003939CB5D8@startechgroup.co.uk> On Wednesday, Feb 12, 2003, at 08:23 Europe/London, Rob W.W. Hooft wrote: > the encoding of the body.... Is there mailer software out there that > automatically interprets this as base64 without explicit instruction > to do that? Yes, of course. I'll give you one guess as to which mailer. They even have a test suite of about 4G of broken emails which they make sure it displays "properly" despite being broken (and no, they won't give away their test suite - we asked). Matt. From gary at inauspicious.org Wed Feb 12 10:50:45 2003 From: gary at inauspicious.org (Gary Benson) Date: Wed Feb 12 05:50:53 2003 Subject: [Spambayes] hammiefilter just started failing In-Reply-To: References: <20030211120304.GC28714@inauspicious.org> Message-ID: <20030212105043.GA816@inauspicious.org> Tim Stone - Four Stones Expressions wrote: > The 'correct' solution to this problem is a retrain. But that's > just not always gonna be practical. I've had problems with these > assertions, but it seems that they should eyebrow>always be true... Hmmm, I just realised that they are floating point numbers: it's quite possible that the cause of the failure is compounded floating point inaccuracies. Is one or both of the numbers in question derived by repeatedly doing something? It doesn't necessarily have to be in one pass of the program if the number comes from the database. I've known things like this happen before, especially if you are doing things like adding numbers of very different orders of magnitude together and then expecting the result to make sense. Gary > The statistics dudes are gonna have to weigh in on this one, because > these parameters are used in the combining scheme, and fooling with > them has consequences. Tim? Gary? Rob? - TimS > 2/11/2003 6:03:05 AM, Gary Benson wrote: > > >I've been using Spambayes for about a fortnight with great success, > >but last night it started to fail. Messages are being delivered (exim > >has a safety net against pipes failing) but they are also being > >bounced with the following traceback: > > > >| Traceback (most recent call last): > >| File "/usr/bin/hammiefilter", line 134, in ? > >| main() > >| File "/usr/bin/hammiefilter", line 131, in main > >| action() > >| File "/usr/bin/hammiefilter", line 87, in filter > >| print h.filter(msg) > >| File "/usr/lib/python2.2/site-packages/spambayes/hammie.py", line 98, in > filter > >| prob, clues = self._scoremsg(msg, True) > >| File "/usr/lib/python2.2/site-packages/spambayes/hammie.py", line 38, in > _scoremsg > >| return self.bayes.spamprob(tokenize(msg), evidence) > >| File "/usr/lib/python2.2/site-packages/spambayes/classifier.py", line > 217, in chi2_spamprob > >| clues = self._getclues(wordstream) > >| File "/usr/lib/python2.2/site-packages/spambayes/classifier.py", line > 441, in _getclues > >| prob = self.probability(record) > >| File "/usr/lib/python2.2/site-packages/spambayes/classifier.py", line > 304, in probability > >| assert spamcount <= nspam > >| AssertionError > > > >This is happening with all messages: a quick check shows that > >spamcount is slightly higher than nspam (like 104 and 103) so I just > >replaced the assertion with 'if spamcount > nspam: spamcount = nspam' > >as a temporary workaround. > > > >Has anyone heard of this happening before? I'd like to know if this > >is a known problem before I start trying to debug it... > > > >I have a copy of my .hammiedb (taken before I did the above tweak) if > >you want it. > > > >Cheers, > >Gary > > > >[ gary@inauspicious.org ][ GnuPG 85A8F78B ][ http://inauspicious.org/ ] > > > >_______________________________________________ > >Spambayes mailing list > >Spambayes@python.org > >http://mail.python.org/mailman/listinfo/spambayes > > > > > > > c'est moi - TimS > http://www.fourstonesExpressions.com > http://wecanstopspam.org > > > [ gary@inauspicious.org ][ GnuPG 85A8F78B ][ http://inauspicious.org/ ] From rob at hooft.net Wed Feb 12 12:10:10 2003 From: rob at hooft.net (Rob W.W. Hooft) Date: Wed Feb 12 06:12:59 2003 Subject: [Spambayes] It gets funnier all the time.... References: <45CD00F7-3E75-11D7-A91C-0003939CB5D8@startechgroup.co.uk> Message-ID: <3E4A2B92.2050100@hooft.net> Matt Sergeant wrote: > On Wednesday, Feb 12, 2003, at 08:23 Europe/London, Rob W.W. Hooft wrote: > >> the encoding of the body.... Is there mailer software out there that >> automatically interprets this as base64 without explicit instruction >> to do that? > > > Yes, of course. I'll give you one guess as to which mailer. Aw. I don't need more than one. But that means that if we wan't to be able to use the clues in spambayes, we either have to make a token base64-encoding-missing or we have to decode it to get the clues from the body. I was lucky enough not to see any body parts in this case because Mozilla is smart enough not to decode it.... If this stays a loophole, more and more spammers might start using it. Rob -- Rob W.W. Hooft || rob@hooft.net || http://www.hooft.net/people/rob/ From wsy at merl.com Wed Feb 12 06:30:54 2003 From: wsy at merl.com (Bill Yerazunis) Date: Wed Feb 12 06:32:35 2003 Subject: [Spambayes] It gets funnier all the time.... In-Reply-To: <3E4A0465.6060403@hooft.net> (rob@hooft.net) References: <3E4A0465.6060403@hooft.net> Message-ID: <200302121130.h1CBUs829254@localhost.localdomain> Rob: I've gotten a few like that- actually, from a reporter at IT News, whom I've repeatedly told that their headers are missing/broken and to have their IT people look at it. I'm not inclined to deal with this guy otherwise. -Bill Yerazunis From jknotzke at shampoo.ca Wed Feb 12 08:41:15 2003 From: jknotzke at shampoo.ca (Justin F. Knotzke) Date: Wed Feb 12 08:41:31 2003 Subject: [Spambayes] FAQ Message-ID: <20030212134114.GA6525@shampoo.ca> Hi, I just installed spambayes. I was using spamassassin and it was working rather well but I figure I'd give spambayes a go. I have a few questions regarding training. 1) I only had about 10 spam messages to train on. Is there somewhere, where I can download a larger list of spam or is it better to simply train on spam as it arrives? 2) Regarding the cron job that does the training, is it OK to train on mail more then once? I move mail from my inbox to a mbox (stored mail) that gets archived every month. If the cron job runs every morning it will train on a folder that it already has training on plus a few more messages. Thanks Justin. -- Justin F. Knotzke jknotzke@shampoo.ca http://www.shampoo.ca From tim at fourstonesExpressions.com Wed Feb 12 07:52:55 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Wed Feb 12 08:53:05 2003 Subject: [Spambayes] FAQ In-Reply-To: <20030212134114.GA6525@shampoo.ca> Message-ID: 2/12/2003 7:41:15 AM, "Justin F. Knotzke" wrote: > > Hi, > > I just installed spambayes. I was using spamassassin and it was >working rather well but I figure I'd give spambayes a go. > > I have a few questions regarding training. > > 1) I only had about 10 spam messages to train on. Is there somewhere, >where I can download a larger list of spam or is it better to simply >train on spam as it arrives? Just go ahead and train on spam as it arrives. Spambayes will become quite accurate very quickly, and your definition of 'spam' is different than anyone else's. We have considered creating some stock databases, such as porn spam, but even then it's difficult to know where someone might draw the line (is Victoria's Secret mail porn?). When you train on your own spam, it's you that's doing the defining, and you'll be very pleased with the results. > > 2) Regarding the cron job that does the training, is it OK to train >on mail more then once? I move mail from my inbox to a mbox (stored >mail) that gets archived every month. If the cron job runs every morning >it will train on a folder that it already has training on plus a few >more messages. It certainly doesn't *hurt* to train on the same mail more than once. It simply increases the spam probability of the words in that mail. It'd be the same as receiving two of the same mail, and training on both. Personally, I'd try to avoid that, but that's just me. No harm, no foul... > > Thanks > > Justin. > >-- >Justin F. Knotzke >jknotzke@shampoo.ca >http://www.shampoo.ca > >_______________________________________________ >Spambayes mailing list >Spambayes@python.org >http://mail.python.org/mailman/listinfo/spambayes > > c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org From jknotzke at shampoo.ca Wed Feb 12 09:34:12 2003 From: jknotzke at shampoo.ca (Justin F. Knotzke) Date: Wed Feb 12 09:34:31 2003 Subject: [Spambayes] Cron Message-ID: <20030212143412.GA6988@shampoo.ca> Hi, I tested my cron script and there are two questions I would like to ask: 1) Is there a way to have mboxtrain run in silent mode and not send out what you see below or do I have to write a procmail script to filter this out? 2) Near the bottom is: Trained 1 out of 1181 messages I had trained on this folder once before and there was one new message added. Does this mean that mboxtrain only adds messages to the database which it has not seen before? Thanks. J ----- Forwarded message from Cron Daemon ----- Envelope-to: jknotzke@shampoo.ca Delivery-date: Wed, 12 Feb 2003 09:21:49 -0500 From: root@shampoo.ca (Cron Daemon) To: jknotzke@shampoo.ca Subject: Cron $HOME/spambayes/mboxtrain.py -d $HOME/.hammiedb -g $HOME/Mail/mbox -s $HOME/Mail/SPAM X-Cron-Env: X-Cron-Env: X-Cron-Env: X-Cron-Env: Date: Wed, 12 Feb 2003 09:21:28 -0500 X-Spambayes-Classification: ham; 0.00 X-Sorted: Default Training ham (/home/jknotzke/Mail/mbox): Reading as Unix mboxrained 1 out of 1181 messages Training spam (/home/jknotzke/Mail/SPAM): Reading as Unix mbox Trained 0 out of 0 messages ----- End forwarded message ----- -- Justin F. Knotzke jknotzke@shampoo.ca http://www.shampoo.ca From papaDoc at videotron.ca Wed Feb 12 09:45:32 2003 From: papaDoc at videotron.ca (papaDoc) Date: Wed Feb 12 09:46:50 2003 Subject: [Spambayes] Cron In-Reply-To: <20030212143412.GA6988@shampoo.ca> References: <20030212143412.GA6988@shampoo.ca> Message-ID: <3E4A5E0C.8010807@videotron.ca> Justin F. Knotzke wrote: > Hi, > > I tested my cron script and there are two questions I would like to >ask: > > 1) Is there a way to have mboxtrain run in silent mode and not send >out what you see below or do I have to write a procmail script to filter >this out? > You can alway send the output to /dev/null Ex. mboxtrain.py [your_options_here] > /dev/null But the option "-q" is the quiet mode that you need Remi From nas at python.ca Wed Feb 12 06:55:59 2003 From: nas at python.ca (Neil Schemenauer) Date: Wed Feb 12 09:47:22 2003 Subject: [Spambayes] It gets funnier all the time.... In-Reply-To: <3E4A2B92.2050100@hooft.net> References: <45CD00F7-3E75-11D7-A91C-0003939CB5D8@startechgroup.co.uk> <3E4A2B92.2050100@hooft.net> Message-ID: <20030212145559.GA9586@glacier.arctrix.com> Rob W.W. Hooft wrote: > But that means that if we wan't to be able to use the clues in > spambayes, we either have to make a token base64-encoding-missing or > we have to decode it to get the clues from the body. Generating a clue sounds best, assuming SB doesn't nail it already. > I was lucky enough not to see any body parts in this case because > Mozilla is smart enough not to decode it.... If this stays a loophole, > more and more spammers might start using it. This behavior is a nightmare for virus filters since they have to know how clients will interpret messages. This behavior has been present for a long time. Exporer does a similar thing. I'm not holding my breath for them to change it. Neil From tim at fourstonesExpressions.com Wed Feb 12 08:57:41 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Wed Feb 12 09:57:54 2003 Subject: [Spambayes] Cron In-Reply-To: <20030212143412.GA6988@shampoo.ca> Message-ID: <04D8TS4W531TQMVTB62VRB796MGA761.3e4a60e5@myst> 2/12/2003 8:34:12 AM, "Justin F. Knotzke" wrote: > > Hi, > > I tested my cron script and there are two questions I would like to >ask: > > 1) Is there a way to have mboxtrain run in silent mode and not send >out what you see below or do I have to write a procmail script to filter >this out? Use the -q option. - TimS > > 2) Near the bottom is: Trained 1 out of 1181 messages > I had trained on this folder once before and there was one new >message added. Does this mean that mboxtrain only adds messages to the >database which it has not seen before? mboxtrain adds a 'X-Spambayes-Trained' header to the messages in the mailbox as it trains, so it will only train new messages normally. The -f option forces mboxtrain to train on all the mail in the mbox, even if it has been trained already. - TimS > > Thanks. > > J > > > >----- Forwarded message from Cron Daemon ----- > >Envelope-to: jknotzke@shampoo.ca >Delivery-date: Wed, 12 Feb 2003 09:21:49 -0500 >From: root@shampoo.ca (Cron Daemon) >To: jknotzke@shampoo.ca >Subject: Cron $HOME/spambayes/mboxtrain.py -d > $HOME/.hammiedb -g $HOME/Mail/mbox -s $HOME/Mail/SPAM >X-Cron-Env: >X-Cron-Env: >X-Cron-Env: >X-Cron-Env: >Date: Wed, 12 Feb 2003 09:21:28 -0500 >X-Spambayes-Classification: ham; 0.00 >X-Sorted: Default > >Training ham (/home/jknotzke/Mail/mbox): > Reading as Unix mboxrained 1 out of 1181 messages >Training spam (/home/jknotzke/Mail/SPAM): > Reading as Unix mbox > Trained 0 out of 0 messages > >----- End forwarded message ----- > >-- >Justin F. Knotzke >jknotzke@shampoo.ca >http://www.shampoo.ca > >_______________________________________________ >Spambayes mailing list >Spambayes@python.org >http://mail.python.org/mailman/listinfo/spambayes > > c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org From tim at fourstonesExpressions.com Wed Feb 12 10:17:15 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Wed Feb 12 11:17:27 2003 Subject: [Spambayes] It gets funnier all the time.... In-Reply-To: <20030212145559.GA9586@glacier.arctrix.com> Message-ID: 2/12/2003 8:55:59 AM, Neil Schemenauer wrote: >Rob W.W. Hooft wrote: >> But that means that if we wan't to be able to use the clues in >> spambayes, we either have to make a token base64-encoding-missing or >> we have to decode it to get the clues from the body. > >Generating a clue sounds best, assuming SB doesn't nail it already. I doubt that the tokenizer would generate any meaningful tokens from this message. Generating a token would be the right way to do it, any ideas how? - TimS > >> I was lucky enough not to see any body parts in this case because >> Mozilla is smart enough not to decode it.... If this stays a loophole, >> more and more spammers might start using it. > >This behavior is a nightmare for virus filters since they have to know >how clients will interpret messages. This behavior has been present for >a long time. Exporer does a similar thing. I'm not holding my breath >for them to change it. > > Neil > >_______________________________________________ >Spambayes mailing list >Spambayes@python.org >http://mail.python.org/mailman/listinfo/spambayes > > c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org From gmino at pcsltd.com Sat Feb 8 11:59:49 2003 From: gmino at pcsltd.com (Gabriel Mino) Date: Wed Feb 12 11:32:54 2003 Subject: [Spambayes] RE: More install stuff from python newbie In-Reply-To: <3261E796E368954CB22963F2B63E81052028A0@xmail.pcsltd.com> Message-ID: <3261E796E368954CB22963F2B63E8105137658@xmail.pcsltd.com> Ok folks here's what I've done thus far: Uninstalled/reinstalled: M$ Office XP/2002, Python-2.2.2.exe, win32all-150.exe, email-2.4.3.tar.gz, bsddb3-4.1.3.win32-py2.2.exe & spambayes via CVS (today 12/02/2003). Run D:\Python22\email-2.4.3\ setup.py build/install, D:\Python22\Lib\site-packages\win32com\client\ makepy.py for both Office 10 + Outlook 10 objects, D:\spambayes\setup.py build/install, D:\spambayes\Outlook2000\addin.py and got: D:\Python22\lib\site-packages\win32com\universal.py:15: UserWarning: win32com.universal argument passing support is incomplete - only types covered in win32com.servers.test_pycomtest are supported warnings.warn(msg) Redirecting output to win32trace remote collector Am now getting the following after starting up Oulook: Outlook Spam Addin module loading Registered: SpamBayes.OutlookAddin Outlook Spam Addin module loading SpamAddin - Connecting to Outlook Created new configuration file 'D:\spambayes\Outlook2000\default_configuration.pck' Loaded bayes database from 'D:\spambayes\Outlook2000\default_bayes_database.pck' Either bayes database or message database is missing - creating new Traceback (most recent call last): File "D:\Python22\lib\site-packages\win32com\universal.py", line 150, in dispatch retVal = ob._InvokeEx_(meth.dispid, 0, pythoncom.DISPATCH_METHOD, args, None, None) File "D:\Python22\lib\site-packages\win32com\server\policy.py", line 322, in _InvokeEx_ return self._invokeex_(dispid, lcid, wFlags, args, kwargs, serviceProvider) File "D:\Python22\lib\site-packages\win32com\server\policy.py", line 562, in _invokeex_ return DesignatedWrapPolicy._invokeex_( self, dispid, lcid, wFlags, args, kwArgs, serviceProvider) File "D:\Python22\lib\site-packages\win32com\server\policy.py", line 510, in _invokeex_ return apply(func, args) File "D:\spambayes\Outlook2000\addin.py", line 615, in OnConnection self.manager = manager.GetManager(application) File "D:\spambayes\Outlook2000\manager.py", line 419, in GetManager _mgr = BayesManager(outlook=outlook, verbose=verbose) File "D:\spambayes\Outlook2000\manager.py", line 158, in __init__ self.LoadBayes() File "D:\spambayes\Outlook2000\manager.py", line 265, in LoadBayes self.InitNewBayes() File "D:\spambayes\Outlook2000\manager.py", line 314, in InitNewBayes self.bayes = self.db_manager.new_bayes() File "D:\spambayes\Outlook2000\manager.py", line 87, in new_bayes os.unlink(self.bayes_filename) exceptions.OSError: [Errno 2] No such file or directory: 'D:\\spambayes\\Outlook2000\\default_bayes_database.pck' TIA gang!!! gabriel From Paul.Moore at atosorigin.com Wed Feb 12 16:37:23 2003 From: Paul.Moore at atosorigin.com (Moore, Paul) Date: Wed Feb 12 11:37:59 2003 Subject: [Spambayes] RE: More install stuff from python newbie Message-ID: <16E1010E4581B049ABC51D4975CEDB880113D902@UKDCX001.uk.int.atosorigin.com> From: Gabriel Mino [mailto:gmino@pcsltd.com] > exceptions.OSError: [Errno 2] No such file or directory: > 'D:\\spambayes\\Outlook2000\\default_bayes_database.pck' This is a known problem - I'm discussing it with Mark Hammond at the moment. For now, you can hack manager.py - in 2 places, there are os.unlink calls followed by an "except IOError, e" clause. Change "IOError" to "EnvironmentError" and you should be away. Longer term, Mark will probably be updating the CVS version... Paul. From tim at fourstonesExpressions.com Wed Feb 12 10:39:56 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Wed Feb 12 11:40:10 2003 Subject: [Spambayes] RE: More install stuff from python newbie In-Reply-To: <16E1010E4581B049ABC51D4975CEDB880113D902@UKDCX001.uk.int.atosorigin.com> Message-ID: 2/12/2003 10:37:23 AM, "Moore, Paul" wrote: >From: Gabriel Mino [mailto:gmino@pcsltd.com] >> exceptions.OSError: [Errno 2] No such file or directory: >> 'D:\\spambayes\\Outlook2000\\default_bayes_database.pck' > >This is a known problem - I'm discussing it with Mark Hammond at >the moment. > >For now, you can hack manager.py - in 2 places, there are os.unlink >calls followed by an "except IOError, e" clause. Change "IOError" to >"EnvironmentError" and you should be away. > >Longer term, Mark will probably be updating the CVS version... Welcome to alpha software world, dude -TimS > >Paul. > >_______________________________________________ >Spambayes mailing list >Spambayes@python.org >http://mail.python.org/mailman/listinfo/spambayes > > c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org From gmino at pcsltd.com Sat Feb 8 12:15:52 2003 From: gmino at pcsltd.com (Gabriel Mino) Date: Wed Feb 12 11:48:54 2003 Subject: [Spambayes] RE: More install stuff from python newbie In-Reply-To: <3261E796E368954CB22963F2B63E81052088DC@xmail.pcsltd.com> Message-ID: <3261E796E368954CB22963F2B63E8105137659@xmail.pcsltd.com> WHOOOOOOOOO WHOOOOOOOO.....YES!!!!! This has done it!!! Thanks all for your patience & putting up w/my newbie stuff. I'll be planning to roll this out using both the plugin config as well as proxy...will report back with results!!!! Gabriel -----Original Message----- From: Moore, Paul [mailto:Paul.Moore@atosorigin.com] Sent: Wednesday, February 12, 2003 11:37 AM To: Gabriel Mino Cc: spambayes@python.org Subject: RE: [Spambayes] RE: More install stuff from python newbie From: Gabriel Mino [mailto:gmino@pcsltd.com] > exceptions.OSError: [Errno 2] No such file or directory: > 'D:\\spambayes\\Outlook2000\\default_bayes_database.pck' This is a known problem - I'm discussing it with Mark Hammond at the moment. For now, you can hack manager.py - in 2 places, there are os.unlink calls followed by an "except IOError, e" clause. Change "IOError" to "EnvironmentError" and you should be away. Longer term, Mark will probably be updating the CVS version... Paul. From skip at pobox.com Wed Feb 12 10:56:16 2003 From: skip at pobox.com (Skip Montanaro) Date: Wed Feb 12 11:56:28 2003 Subject: [Spambayes] It gets funnier all the time.... In-Reply-To: References: <20030212145559.GA9586@glacier.arctrix.com> Message-ID: <15946.31920.887523.201008@montanaro.dyndns.org> >>>>> "Tim" == Tim Stone <- Four Stones Expressions > writes: Tim> 2/12/2003 8:55:59 AM, Neil Schemenauer wrote: >> Rob W.W. Hooft wrote: >>> But that means that if we wan't to be able to use the clues in >>> spambayes, we either have to make a token base64-encoding-missing or >>> we have to decode it to get the clues from the body. >> >> Generating a clue sounds best, assuming SB doesn't nail it already. Tim> I doubt that the tokenizer would generate any meaningful tokens Tim> from this message. Generating a token would be the right way to do Tim> it, any ideas how? Sure, generate a "no explicit content-transfer-encoding" token. Alas, most mail messages are written with Content-Type: text/plain; charset="us-ascii" and don't contain a Content-Transfer-Encoding header, so all by itself it probably wouldn't be a very useful clue. The tokenizer does have access to the entire message though, so it could conceivably guess at encodings if no CTE header was given and the first line of the message body was long (suggesting base-64) or looked like the start of a uuencode block. Skip From tim at fourstonesExpressions.com Wed Feb 12 11:00:13 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Wed Feb 12 12:00:21 2003 Subject: [Spambayes] It gets funnier all the time.... In-Reply-To: <15946.31920.887523.201008@montanaro.dyndns.org> Message-ID: 2/12/2003 10:56:16 AM, Skip Montanaro wrote: >>>>>> "Tim" == Tim Stone <- Four Stones Expressions > writes: > > Tim> 2/12/2003 8:55:59 AM, Neil Schemenauer wrote: > >> Rob W.W. Hooft wrote: > >>> But that means that if we wan't to be able to use the clues in > >>> spambayes, we either have to make a token base64-encoding-missing or > >>> we have to decode it to get the clues from the body. > >> > >> Generating a clue sounds best, assuming SB doesn't nail it already. > > Tim> I doubt that the tokenizer would generate any meaningful tokens > Tim> from this message. Generating a token would be the right way to do > Tim> it, any ideas how? > >Sure, generate a "no explicit content-transfer-encoding" token. Alas, most >mail messages are written with > > Content-Type: text/plain; charset="us-ascii" > >and don't contain a Content-Transfer-Encoding header, Right. > so all by itself it >probably wouldn't be a very useful clue. The tokenizer does have access to >the entire message though, so it could conceivably guess at encodings if no >CTE header was given and the first line of the message body was long >(suggesting base-64) or looked like the start of a uuencode block. I suppose we could have a 'first_line_max_length' option that would trigger a base-64 decode of the first line, followed by a check for 'printable' characters in the result... seem reasonable? - TimS > >Skip > > c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org From nas at python.ca Wed Feb 12 09:21:29 2003 From: nas at python.ca (Neil Schemenauer) Date: Wed Feb 12 12:12:52 2003 Subject: [Spambayes] It gets funnier all the time.... In-Reply-To: References: <20030212145559.GA9586@glacier.arctrix.com> Message-ID: <20030212172129.GA9928@glacier.arctrix.com> Tim Stone - Four Stones Expressions wrote: > I doubt that the tokenizer would generate any meaningful tokens from this > message. Generating a token would be the right way to do it, any ideas how? Something like: import string import re BASE64_CHARSET = string.ascii_letters + string.digits + "+/" valid_base64 = re.compile('[%s]$' % BASE64_CHARSET).match def tokenize_word(...): ... elif 60 <= n <= 76 and valid_base64(word): yield 'bare base64' I don't know if 60 is reasonable as a lower bound. Does someone want to test Outlook? Maybe it only magically detects base-64 if the line is exactly 76 characters long. Neil From tim at fourstonesExpressions.com Wed Feb 12 11:22:47 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Wed Feb 12 12:22:56 2003 Subject: [Spambayes] It gets funnier all the time.... In-Reply-To: <20030212172129.GA9928@glacier.arctrix.com> Message-ID: <73TOSOA8XVIE3Z5Y736TO95GCSQML82.3e4a82e7@myst> 2/12/2003 11:21:29 AM, Neil Schemenauer wrote: >Tim Stone - Four Stones Expressions wrote: >> I doubt that the tokenizer would generate any meaningful tokens from this >> message. Generating a token would be the right way to do it, any ideas how? > >Something like: > > import string > import re > BASE64_CHARSET = string.ascii_letters + string.digits + "+/" > valid_base64 = re.compile('[%s]$' % BASE64_CHARSET).match > > def tokenize_word(...): > ... > elif 60 <= n <= 76 and valid_base64(word): > yield 'bare base64' That's the idea. Do we have any of these kind of messages in any of our test corpora? If not, we need to find some... -TimS > >I don't know if 60 is reasonable as a lower bound. Does someone want to >test Outlook? Maybe it only magically detects base-64 if the line is >exactly 76 characters long. > > Neil > >_______________________________________________ >Spambayes mailing list >Spambayes@python.org >http://mail.python.org/mailman/listinfo/spambayes > > c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org From skip at pobox.com Wed Feb 12 11:47:02 2003 From: skip at pobox.com (Skip Montanaro) Date: Wed Feb 12 12:47:15 2003 Subject: [Spambayes] It gets funnier all the time.... In-Reply-To: <73TOSOA8XVIE3Z5Y736TO95GCSQML82.3e4a82e7@myst> References: <20030212172129.GA9928@glacier.arctrix.com> <73TOSOA8XVIE3Z5Y736TO95GCSQML82.3e4a82e7@myst> Message-ID: <15946.34966.849517.395442@montanaro.dyndns.org> Tim> That's the idea. Do we have any of these kind of messages in any Tim> of our test corpora? If not, we need to find some... -TimS Easily constructed. I believe the attached should do the trick. Skip -------------- next part -------------- An embedded message was scrubbed... From: noreply@zwallet.com Subject: Wow! I was in profit in 2 hours [payment notices inside] Date: Fri, 7 Feb 2003 09:03:58 -0600 Size: 2960 Url: http://mail.python.org/pipermail/spambayes/attachments/20030212/b5cf6f6d/attachment.eml From neale at woozle.org Wed Feb 12 09:51:45 2003 From: neale at woozle.org (Neale Pickett) Date: Wed Feb 12 12:52:16 2003 Subject: [Spambayes] Cron In-Reply-To: <20030212143412.GA6988@shampoo.ca> ("Justin F. Knotzke"'s message of "Wed, 12 Feb 2003 09:34:12 -0500") References: <20030212143412.GA6988@shampoo.ca> Message-ID: "Justin F. Knotzke" writes: > 2) Near the bottom is: Trained 1 out of 1181 messages > I had trained on this folder once before and there was one new > message added. Does this mean that mboxtrain only adds messages to the > database which it has not seen before? That's right, it only trains on message that haven't already been trained, or have changed classification. So if you keep mail around for a while, and your mail program is capable of running external filters (eg. mutt or gnus), you'll get better performance using the -t option to hammiefilter and never running mboxtrain. Of course, if you run mboxtrain out of cron early in the morning when you're asleep, you probably don't care much about performance :) Neale From tim at fourstonesExpressions.com Wed Feb 12 13:28:01 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Wed Feb 12 14:28:11 2003 Subject: [Spambayes] It gets funnier all the time.... In-Reply-To: <20030212172129.GA9928@glacier.arctrix.com> Message-ID: 2/12/2003 11:21:29 AM, Neil Schemenauer wrote: >Tim Stone - Four Stones Expressions wrote: >> I doubt that the tokenizer would generate any meaningful tokens from this >> message. Generating a token would be the right way to do it, any ideas how? > >Something like: > > import string > import re > BASE64_CHARSET = string.ascii_letters + string.digits + "+/" > valid_base64 = re.compile('[%s]$' % BASE64_CHARSET).match > > def tokenize_word(...): > ... > elif 60 <= n <= 76 and valid_base64(word): > yield 'bare base64' I'm thinking we don't care about this unless the content_transfer_encoding header is missing? Or do we care at any rate? - TimS > >I don't know if 60 is reasonable as a lower bound. Does someone want to >test Outlook? Maybe it only magically detects base-64 if the line is >exactly 76 characters long. > > Neil > >_______________________________________________ >Spambayes mailing list >Spambayes@python.org >http://mail.python.org/mailman/listinfo/spambayes > > c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org From tim at fourstonesExpressions.com Wed Feb 12 13:48:20 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Wed Feb 12 14:48:27 2003 Subject: [Spambayes] It gets funnier all the time.... In-Reply-To: <15946.42148.367751.192349@montanaro.dyndns.org> Message-ID: 2/12/2003 1:46:44 PM, Skip Montanaro wrote: > > Tim> I'm thinking we don't care about this unless the > Tim> content_transfer_encoding header is missing? Or do we care at any > Tim> rate? - TimS > >If we're going to bother trying to decode base64 when no >Content-Transfer-Encoding header is present, we might as well decode it when >we do have one. I guess we should then tokenize it... - TimS > >Skip > > > c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org From skip at pobox.com Wed Feb 12 13:46:44 2003 From: skip at pobox.com (Skip Montanaro) Date: Wed Feb 12 14:50:09 2003 Subject: [Spambayes] It gets funnier all the time.... In-Reply-To: References: <20030212172129.GA9928@glacier.arctrix.com> Message-ID: <15946.42148.367751.192349@montanaro.dyndns.org> Tim> I'm thinking we don't care about this unless the Tim> content_transfer_encoding header is missing? Or do we care at any Tim> rate? - TimS If we're going to bother trying to decode base64 when no Content-Transfer-Encoding header is present, we might as well decode it when we do have one. Skip From skip at pobox.com Wed Feb 12 13:52:17 2003 From: skip at pobox.com (Skip Montanaro) Date: Wed Feb 12 14:52:28 2003 Subject: [Spambayes] It gets funnier all the time.... In-Reply-To: References: <20030212172129.GA9928@glacier.arctrix.com> Message-ID: <15946.42481.616545.393375@montanaro.dyndns.org> If we're going to bother trying to decode base64 when no Content-Transfer-Encoding header is present, we might as well decode it when we do have one. Whoops... my bad. It *is* decoded now if it's text (using .get_payload()). It also tries to decode broken base64 text. Now all we need is to sniff out the real encoding if one wasn't given. Skip From tim at fourstonesExpressions.com Wed Feb 12 13:53:47 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Wed Feb 12 14:53:56 2003 Subject: [Spambayes] It gets funnier all the time.... In-Reply-To: <15946.42481.616545.393375@montanaro.dyndns.org> Message-ID: <61IC2USM5YRLNFDKJ6FC31OGAVRA9.3e4aa64b@myst> 2/12/2003 1:52:17 PM, Skip Montanaro wrote: > > If we're going to bother trying to decode base64 when no > Content-Transfer-Encoding header is present, we might as well decode it > when we do have one. > >Whoops... my bad. It *is* decoded now if it's text (using .get_payload()). >It also tries to decode broken base64 text. Now all we need is to sniff out >the real encoding if one wasn't given. Ya, I just discovered that too... so back to the original mail that started this whole thing. Was that not base64? - TimS > >Skip > > > >_______________________________________________ >Spambayes mailing list >Spambayes@python.org >http://mail.python.org/mailman/listinfo/spambayes > > c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org From skip at pobox.com Wed Feb 12 14:08:35 2003 From: skip at pobox.com (Skip Montanaro) Date: Wed Feb 12 15:08:44 2003 Subject: [Spambayes] It gets funnier all the time.... In-Reply-To: <61IC2USM5YRLNFDKJ6FC31OGAVRA9.3e4aa64b@myst> References: <15946.42481.616545.393375@montanaro.dyndns.org> <61IC2USM5YRLNFDKJ6FC31OGAVRA9.3e4aa64b@myst> Message-ID: <15946.43459.579379.181577@montanaro.dyndns.org> Tim> Ya, I just discovered that too... so back to the original mail that Tim> started this whole thing. Was that not base64? - TimS What about this? No c-t-e header, so the base64 crap will come back unchanged. If the first "word" of the decoded payload is longer than 60 characters, feed to the base64 fixer-upper: *** /tmp/skip/tokenizer.py.~1.4~ Wed Feb 12 14:06:51 2003 --- /tmp/skip/tokenizer.py Wed Feb 12 14:06:51 2003 *************** *** 1331,1336 **** --- 1331,1339 ---- # Decode, or take it as-is if decoding fails. try: text = part.get_payload(decode=True) + if len(text.split()[0]) > 60: + # just in case it's encoded but no c-t-e header was given + text = try_to_repair_damaged_base64(text) except: yield "control: couldn't decode" text = part.get_payload(decode=False) Skip From brian at bstpierre.org Wed Feb 12 12:03:40 2003 From: brian at bstpierre.org (Brian St. Pierre) Date: Wed Feb 12 16:16:03 2003 Subject: [Spambayes] successful install, thanks! Message-ID: <20030212170340.GA216@BOBO> Hi - I just wanted to report that my install (I grabbed a cvs snapshot) went very smoothly: $ uname -a CYGWIN_NT-5.1 BOBO 1.3.17(0.67/3/2) 2002-11-27 18:54 i686 unknown $ python Python 2.2.2 (#1, Nov 15 2002, 07:49:04) [GCC 2.95.3-5 (cygwin special)] on cygwin -- Brian St. Pierre brian @ bstpierre.org http://bstpierre.org/ From tim at fourstonesExpressions.com Wed Feb 12 15:38:24 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Wed Feb 12 16:38:33 2003 Subject: [Spambayes] It gets funnier all the time.... In-Reply-To: <15946.43459.579379.181577@montanaro.dyndns.org> Message-ID: No cte header is probably a clue... Problem here is that this happens before html stripping, and so the first word could conceivably be > 60, if it's an anchor tag with a long url or something... I think I like your original idea of checking for base64 character match... - TimS 2/12/2003 2:08:35 PM, Skip Montanaro wrote: > > Tim> Ya, I just discovered that too... so back to the original mail that > Tim> started this whole thing. Was that not base64? - TimS > >What about this? No c-t-e header, so the base64 crap will come back >unchanged. If the first "word" of the decoded payload is longer than 60 >characters, feed to the base64 fixer-upper: > >*** /tmp/skip/tokenizer.py.~1.4~ Wed Feb 12 14:06:51 2003 >--- /tmp/skip/tokenizer.py Wed Feb 12 14:06:51 2003 >*************** >*** 1331,1336 **** >--- 1331,1339 ---- > # Decode, or take it as-is if decoding fails. > try: > text = part.get_payload(decode=True) >+ if len(text.split()[0]) > 60: >+ # just in case it's encoded but no c-t-e header was given >+ yield "control: no cte header" >+ text = try_to_repair_damaged_base64(text) > except: > yield "control: couldn't decode" > text = part.get_payload(decode=False) > >Skip > > c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org From tim at fourstonesExpressions.com Wed Feb 12 15:56:30 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Wed Feb 12 16:56:39 2003 Subject: [Spambayes] spampot and black holes Message-ID: <82ID05E0DEBMKKFA1TMG1W5484A9ML.3e4ac30e@myst> I'm wondering if it might not be a reasonable idea to run some very easy to find spampots, which (of course) don't actually relay the spam. Might make life sucky (at least temporarily) for some spammer somewhere... We could even make it so it's particularly slow in response time, so their spam activity takes forever. We could even return an smtp error response at random that says something like "We're out here, and we will stop you." c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org From wsy at merl.com Wed Feb 12 17:31:12 2003 From: wsy at merl.com (Bill Yerazunis) Date: Wed Feb 12 17:31:55 2003 Subject: [Spambayes] It gets funnier all the time.... In-Reply-To: (message from Tim Stone - Four Stones Expressions on Wed, 12 Feb 2003 10:17:15 -0600) References: Message-ID: <200302122231.h1CMVCp30252@localhost.localdomain> From: Tim Stone - Four Stones Expressions >> But that means that if we wan't to be able to use the clues in >> spambayes, we either have to make a token base64-encoding-missing or >> we have to decode it to get the clues from the body. > >Generating a clue sounds best, assuming SB doesn't nail it already. I doubt that the tokenizer would generate any meaningful tokens from this message. Generating a token would be the right way to do it, any ideas how? Tim: The problem in detecting an un-marked base64 is that the base64 itself is pretty much indistingushable from one-word-per-line text. The regex for base64's that CRM114 uses is \n\n(([a-zA-Z0-9+=\/]\{55,80\}\n)\{4,200\}.\{0,80\}\n) (where \n has the usual C-ish meaning of an embedded newline) The problem is that this regex may misfire on one-word-per-line text; that's why it requires at least four such lines, uinterrupted, with each line of at least 55 characters, and at least two leading blank lines. You can also use the matched text as input to the 'mimencode -u" shell command to actually un-encode the base64 and work against the text inside, which is what CRM114 itself does (the presence of base64 without headers marking it as such is a no-op). Anyway, give it a try and see if it works for you. -Bill Yerazunis (mostly CRM114, but it's good to cross-pollinate) From skip at pobox.com Wed Feb 12 17:00:14 2003 From: skip at pobox.com (Skip Montanaro) Date: Wed Feb 12 18:00:30 2003 Subject: [Spambayes] It gets funnier all the time.... In-Reply-To: <200302122231.h1CMVCp30252@localhost.localdomain> References: <200302122231.h1CMVCp30252@localhost.localdomain> Message-ID: <15946.53758.685215.359552@montanaro.dyndns.org> Bill> The problem in detecting an un-marked base64 is that the base64 Bill> itself is pretty much indistingushable from one-word-per-line Bill> text. True enough, and in that case, the try-to-fix-broken-base64 function just returns the input text. No harm, no foul. The performance penalty should be reasonable. One split() and length check for each part in the common case. The try-to-fix function is only called if it's long enough, which should be rare. Skip From tim at fourstonesExpressions.com Wed Feb 12 17:18:32 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Wed Feb 12 18:18:42 2003 Subject: [Spambayes] It gets funnier all the time.... In-Reply-To: <15946.53758.685215.359552@montanaro.dyndns.org> Message-ID: 2/12/2003 5:00:14 PM, Skip Montanaro wrote: > > Bill> The problem in detecting an un-marked base64 is that the base64 > Bill> itself is pretty much indistingushable from one-word-per-line > Bill> text. > >True enough, and in that case, the try-to-fix-broken-base64 function just >returns the input text. No harm, no foul. The performance penalty should >be reasonable. One split() and length check for each part in the common >case. The try-to-fix function is only called if it's long enough, which >should be rare. Are you testing this, Skip? > >Skip > > > c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org From whisper at oz.net Wed Feb 12 15:36:26 2003 From: whisper at oz.net (David LeBlanc) Date: Wed Feb 12 18:46:14 2003 Subject: [Spambayes] Problem installing Spammie on Outlook 2000 Message-ID: Win 2000 Pro Outlook 2000 CVS Spambayes from 2/10/2003 PythonWin Build 150 email 2.4.3 I had previously installed a plugin test script that puts a button "Python" on the toolbar. When I ran addin.py I got: J:\Apps\Spammie>addin J:\Python22\lib\site-packages\win32com\universal.py:15: UserWarning: win32com.universal argument passing support is incomplete - only types covered in win32com.servers.test_pycomtest are supported warnings.warn(msg) Registered: SpamBayes.OutlookAddin Now, when I click on the "Python" button in Outlook, I get "Hello from Python" and that's all. How do I fix this? TIA, David LeBlanc Seattle, WA USA From T.A.Meyer at massey.ac.nz Thu Feb 13 12:56:46 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Wed Feb 12 18:57:28 2003 Subject: [Spambayes] Problem installing Spammie on Outlook 2000 Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1318CD52@its-xchg4.massey.ac.nz> > When I ran addin.py I got: > J:\Apps\Spammie>addin > J:\Python22\lib\site-packages\win32com\universal.py:15: UserWarning: > win32com.universal argument passing support is incomplete - only types > covered in win32com.servers.test_pycomtest are supported > warnings.warn(msg) > Registered: SpamBayes.OutlookAddin I *think* I used to get this too, but I don't anymore. Mark, do you know why this appears? Is it something to do with the version of the win32com stuff? > I had previously installed a plugin test script that puts a > button "Python" on the toolbar.> Now, when I click on the "Python" button in Outlook, I get "Hello from > Python" and that's all. > How do I fix this? What's broken? To remove the demo "Python" button, just run the same demo script again with the parameter "--unregister". You should (also) have an "Anti-Spam" button on the toolbar. This is where you can find all the spambayes config. If you don't have this button, then if you could send a trace* to the list that would probably be easiest. =Tony Meyer * Open PythonWin, and choose TOOLS, then TRACE COLLECTOR DEBUGGING TOOL. Then open Outlook and copy everything in the trace window from the start of the error. From T.A.Meyer at massey.ac.nz Thu Feb 13 13:00:19 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Wed Feb 12 19:00:57 2003 Subject: [Spambayes] Problem installing Spammie on Outlook 2000 Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1318D4B7@its-xchg4.massey.ac.nz> > CVS Spambayes from 2/10/2003 One other thing is that you should get today's CVS (13/02/03) - a fix was just recently checked in, without which you'll run into problems once you do get things working. In fact, this might even be the cause (the trace would tell...) =Tony Meyer From whisper at oz.net Wed Feb 12 16:19:46 2003 From: whisper at oz.net (David LeBlanc) Date: Wed Feb 12 19:17:45 2003 Subject: [Spambayes] Problem installing Spammie on Outlook 2000 In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F1318CD52@its-xchg4.massey.ac.nz> Message-ID: David LeBlanc Seattle, WA USA > What's broken? To remove the demo "Python" button, just run the > same demo script again with the parameter "--unregister". I wish I remembered the name of that demo script!!! > You should (also) have an "Anti-Spam" button on the toolbar. > This is where you can find all the spambayes config. If you > don't have this button, then if you could send a trace* to the > list that would probably be easiest. > > =Tony Meyer > > * Open PythonWin, and choose TOOLS, then TRACE COLLECTOR > DEBUGGING TOOL. Then open Outlook and copy everything in the > trace window from the start of the error. No Anti-Spam button. Trace Collector Debugging Tool returned: File "J:\Python22\lib\site-packages\win32com\universal.py", line 150, in dispatch retVal = ob._InvokeEx_(meth.dispid, 0, pythoncom.DISPATCH_METHOD, args, None, None) File "J:\Python22\lib\site-packages\win32com\server\policy.py", line 322, in _InvokeEx_ return self._invokeex_(dispid, lcid, wFlags, args, kwargs, serviceProvider) File "J:\Python22\lib\site-packages\win32com\server\policy.py", line 562, in _invokeex_ return DesignatedWrapPolicy._invokeex_( self, dispid, lcid, wFlags, args, kwArgs, serviceProvider) File "J:\Python22\lib\site-packages\win32com\server\policy.py", line 510, in _invokeex_ return apply(func, args) File "J:\Apps\Spammie\addin.py", line 615, in OnConnection self.manager = manager.GetManager(application) File "J:\Apps\Spammie\manager.py", line 419, in GetManager _mgr = BayesManager(outlook=outlook, verbose=verbose) File "J:\Apps\Spammie\manager.py", line 158, in __init__ self.LoadBayes() File "J:\Apps\Spammie\manager.py", line 265, in LoadBayes self.InitNewBayes() File "J:\Apps\Spammie\manager.py", line 314, in InitNewBayes self.bayes = self.db_manager.new_bayes() File "J:\Apps\Spammie\manager.py", line 87, in new_bayes os.unlink(self.bayes_filename) exceptions.OSError: [Errno 2] No such file or directory: 'J:\\Apps\\Spammie\\default_bayes_database.pck' OnStartupComplete None Don't recall anything about setting up any kind of db support or mention of any particular package needed for it. Dave LeBlanc Seattle, WA USA From T.A.Meyer at massey.ac.nz Thu Feb 13 13:20:24 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Wed Feb 12 19:21:02 2003 Subject: [Spambayes] Problem installing Spammie on Outlook 2000 Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1318D4B8@its-xchg4.massey.ac.nz> > > What's broken? To remove the demo "Python" button, just run the > > same demo script again with the parameter "--unregister". > I wish I remembered the name of that demo script!!! It's (I presume) [pythondir]/Lib/site-packages/win32com/demos/outlookAddin.py > File "J:\Apps\Spammie\manager.py", line 87, in new_bayes > os.unlink(self.bayes_filename) > exceptions.OSError: [Errno 2] No such file or directory: > 'J:\\Apps\\Spammie\\default_bayes_database.pck' > OnStartupComplete None Ah, this is the thing I mentioned in my second email - the fix for this was only checked into CVS a few hours ago. If you update from CVS this will be fixed. =Tony Meyer From whisper at oz.net Wed Feb 12 16:56:38 2003 From: whisper at oz.net (David LeBlanc) Date: Wed Feb 12 19:54:37 2003 Subject: [Spambayes] Spammie on Outlook 2000 - questions Message-ID: Ok, got it to work after d/ling the 2/13 CVS! HUrah... sort of. Thanks to Tony Meyer for his help! So, some questions: I created "Spam" and "Spam Maybe" folders and told spammie to use those. Good so far. I also copied 77 deleted folder items into Spam and trained on that vs. inbox. Still good. Some of those messages might have been "in the wrong place" messages (gotta LOVE OL's filtering moosh) and that's semi-troubling for the future. Now, do I have to save the Spam folder contents? I found one message in Spam Maybe that ended up being good. I did "Spam Recover" and got a rule error from OL about unable to move message to (as it happens) python-list. Closed message box and hit Spam recover again - no errors and message was moved to inbox. I notice OL now running even slower than it's previously slothful self... David LeBlanc Seattle, WA USA From skip at pobox.com Wed Feb 12 18:59:36 2003 From: skip at pobox.com (Skip Montanaro) Date: Wed Feb 12 19:59:45 2003 Subject: [Spambayes] It gets funnier all the time.... In-Reply-To: References: <15946.53758.685215.359552@montanaro.dyndns.org> Message-ID: <15946.60920.928977.174316@montanaro.dyndns.org> >> True enough, and in that case, the try-to-fix-broken-base64 function >> just returns the input text. No harm, no foul.... Tim> Are you testing this, Skip? This evening, yes, I will. Skip From noreply at sourceforge.net Wed Feb 12 14:13:08 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Wed Feb 12 20:26:03 2003 Subject: [Spambayes] [ spambayes-Bugs-680158 ] Outlook addin cannot create new database Message-ID: Bugs item #680158, was opened at 2003-02-04 23:53 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=680158&group_id=61702 Category: None Group: None >Status: Closed >Resolution: Fixed Priority: 5 Submitted By: Duncan Booth (duncanb) Assigned to: Nobody/Anonymous (nobody) Summary: Outlook addin cannot create new database Initial Comment: Outlook2000\manager.py, revision 1.42 contains this code: def new_bayes(self): # Just delete the file and do an "open" try: os.unlink(self.bayes_filename) except IOError, e: if e.errno != errno.ENOENT: raise return self.open_bayes() Python 2.2 under windows raises OSError when os.unlink fails, so this code should be: def new_bayes(self): # Just delete the file and do an "open" try: os.unlink(self.bayes_filename) except (OSError,IOError), e: if e.errno != errno.ENOENT: raise return self.open_bayes() ---------------------------------------------------------------------- >Comment By: Mark Hammond (mhammond) Date: 2003-02-13 09:13 Message: Logged In: YES user_id=14198 Thanks! Checking in manager.py; /cvsroot/spambayes/spambayes/Outlook2000/manager.py,v <-- manager.py new revision: 1.46; previous revision: 1.45 done ---------------------------------------------------------------------- Comment By: Tony Meyer (anadelonbrin) Date: 2003-02-09 12:28 Message: Logged In: YES user_id=552329 I get this too, and the fix works for me. Here's a complete trace from running addin.py (fresh CVS) to the error when launching Outlook. Outlook Spam Addin module loading SpamAddin - Connecting to Outlook Created new configuration file 'D:\spambayes\Outlook2000 \default_configuration.pck' Loaded bayes database from 'D:\spambayes\Outlook2000 \default_bayes_database.pck' Either bayes database or message database is missing - creating new Traceback (most recent call last): File "D:\Python22\lib\site-packages\win32com\universal.py", line 150, in dispatch retVal = ob._InvokeEx_(meth.dispid, 0, pythoncom.DISPATCH_METHOD, args, None, None) File "D:\Python22\lib\site- packages\win32com\server\policy.py", line 322, in _InvokeEx_ return self._invokeex_(dispid, lcid, wFlags, args, kwargs, serviceProvider) File "D:\Python22\lib\site- packages\win32com\server\policy.py", line 562, in _invokeex_ return DesignatedWrapPolicy._invokeex_( self, dispid, lcid, wFlags, args, kwArgs, serviceProvider) File "D:\Python22\lib\site- packages\win32com\server\policy.py", line 510, in _invokeex_ return apply(func, args) File "D:\spambayes\Outlook2000\addin.py", line 615, in OnConnection self.manager = manager.GetManager(application) File "D:\spambayes\Outlook2000\manager.py", line 419, in GetManager _mgr = BayesManager(outlook=outlook, verbose=verbose) File "D:\spambayes\Outlook2000\manager.py", line 158, in __init__ self.LoadBayes() File "D:\spambayes\Outlook2000\manager.py", line 265, in LoadBayes self.InitNewBayes() File "D:\spambayes\Outlook2000\manager.py", line 314, in InitNewBayes self.bayes = self.db_manager.new_bayes() File "D:\spambayes\Outlook2000\manager.py", line 87, in new_bayes os.unlink(self.bayes_filename) exceptions.OSError: [Errno 2] No such file or directory: 'D:\spambayes\Outlook2000 \default_bayes_database.pck' ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=680158&group_id=61702 From noreply at sourceforge.net Wed Feb 12 14:10:02 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Wed Feb 12 20:26:06 2003 Subject: [Spambayes] [ spambayes-Bugs-683250 ] Error in outlook/readme Message-ID: Bugs item #683250, was opened at 2003-02-09 15:00 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=683250&group_id=61702 Category: None Group: None >Status: Closed >Resolution: Fixed Priority: 5 Submitted By: Tony Meyer (anadelonbrin) Assigned to: Mark Hammond (mhammond) Summary: Error in outlook/readme Initial Comment: Pretty minor, but in the readme.txt in the outlook folder, under known bugs this is listed: * Filtering an Exchange Server public store appears to not work (is this still true?) This is *not* still true :) so this line could go away. ---------------------------------------------------------------------- >Comment By: Mark Hammond (mhammond) Date: 2003-02-13 09:10 Message: Logged In: YES user_id=14198 Checking in README.txt; /cvsroot/spambayes/spambayes/Outlook2000/README.txt,v <-- README.txt new revision: 1.9; previous revision: 1.8 ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=683250&group_id=61702 From tim_one at email.msn.com Wed Feb 12 20:44:54 2003 From: tim_one at email.msn.com (Tim Peters) Date: Wed Feb 12 20:45:45 2003 Subject: [Spambayes] It gets funnier all the time.... In-Reply-To: Message-ID: [Skip] > The performance penalty should be reasonable. One split() and length > check for each part in the common case. The try-to-fix function is only > called if it's long enough, which should be rare. I'd hate to see the code bloat with gimmicks that don't prove themselves via testing (as TESTING.txt warned from the start, such stuff at best gets in the way later of testing things that might truly help a little, by adding a quasi-random component to testing results). Has this form of spam been an actual problem for anyone here? Note that in Rob's original post, the classification line was X-Spambayes-Classification: unsure; 0.98 Adding code that moves this example's score from 0.98 to, say, 0.995, doesn't appear worth any effort. From T.A.Meyer at massey.ac.nz Thu Feb 13 15:19:35 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Wed Feb 12 21:20:11 2003 Subject: [Spambayes] Outlook Plugin Crashing Outlook Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1318D4B9@its-xchg4.massey.ac.nz> Something (not spambayes!) crashed my system today and now Outlook won't load with the plugin installed (but will without). For some reason nothing appears in the trace on launching Outlook (apart from the loading message), but if I execute manager.py, I get the following (along with the same windows error dialog saying that errors have been caused and the program will be closed). Traceback (most recent call last): File "D:\PROGRA~1\Python22\lib\site-packages\Pythonwin\pywin\framework\scriptutils.py", line 310, in RunScript exec codeObject in __main__.__dict__ File "D:\CVS_Modules\spambayes\Outlook2000\manager.py", line 603, in ? sys.exit(main(verbose)) File "D:\CVS_Modules\spambayes\Outlook2000\manager.py", line 578, in main mgr = GetManager(verbose=verbose_level) File "D:\CVS_Modules\spambayes\Outlook2000\manager.py", line 566, in GetManager outlook = win32com.client.Dispatch("Outlook.Application") File "D:\PROGRA~1\Python22\lib\site-packages\win32com\client\__init__.py", line 95, in Dispatch dispatch, userName = dynamic._GetGoodDispatchAndUserName(dispatch,userName,clsctx) File "D:\PROGRA~1\Python22\lib\site-packages\win32com\client\dynamic.py", line 84, in _GetGoodDispatchAndUserName return (_GetGoodDispatch(IDispatch, clsctx), userName) File "D:\PROGRA~1\Python22\lib\site-packages\win32com\client\dynamic.py", line 72, in _GetGoodDispatch IDispatch = pythoncom.CoCreateInstance(IDispatch, None, clsctx, pythoncom.IID_IDispatch) com_error: (-2146959355, 'Server execution failed', None, None) It seems that the problem is with the win32com.client.Dispatch call. (Or, I suspect, my Outlook). I've dumped the .ini, the message stores and the config pickle in case the problem was with any of them, but that hasn't helped. Any suggestions about what to do? Thanks, Tony Meyer From skip at pobox.com Wed Feb 12 21:21:14 2003 From: skip at pobox.com (Skip Montanaro) Date: Wed Feb 12 22:21:18 2003 Subject: [Spambayes] ini file fumbling broke Message-ID: <15947.3882.109560.817917@montanaro.dyndns.org> Someone recently decreed that all files mentioned in BAYESCUSTOMIZE must end in ".ini" and modified Options.py (I named my customize file ~/hammie.opt). Was this related to the embedded-spaces-in-paths problem? Sumthin's gotta give I think. If spaces are common in filenames, we need to pick a better separator. (Or allow the separator to be platform-specific.) On Unix systems, ":" is a good path separator (but would be bad on MacOS < X systems). I think ";" is more common on Windows. I don't think forcing customize files to end in ".ini" is right. Even one of the default files searched for in Options.py is "~/.spambayesrc". Thoughts? Skip From skip at pobox.com Wed Feb 12 21:26:39 2003 From: skip at pobox.com (Skip Montanaro) Date: Wed Feb 12 22:26:52 2003 Subject: [Spambayes] It gets funnier all the time.... In-Reply-To: References: Message-ID: <15947.4207.366335.199709@montanaro.dyndns.org> >> The performance penalty should be reasonable. One split() and length >> check for each part in the common case. The try-to-fix function is >> only called if it's long enough, which should be rare. Tim> I'd hate to see the code bloat with gimmicks that don't prove Tim> themselves via testing I haven't checked anything in. People asked about decoding stuff that was encoded but didn't have a Content-Transfer-Encoding header. I suggested the diff I posted. That's as far as it's gone at this point. Skip From T.A.Meyer at massey.ac.nz Thu Feb 13 16:33:36 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Wed Feb 12 22:34:11 2003 Subject: [Spambayes] ini file fumbling broke Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1318CD53@its-xchg4.massey.ac.nz> > Someone recently decreed that all files mentioned in > BAYESCUSTOMIZE must end > in ".ini" and modified Options.py (I named my customize file > ~/hammie.opt). > Was this related to the embedded-spaces-in-paths problem? Yes. TimS used the magic of regex to 'fix' this problem. To be fair, it did fix that problem... ;) > If spaces are common in filenames, we need to pick a better > separator. [...] > Thoughts? (I suggested this prior to the regex fix, but it became obsolete). What about tokenising the separator character? i.e. a function SafeifyFilename (with a better name, obviously) takes a file+path (string) and returns a string with the separator character (whatever it is) replaced by a special character string, and all instances of the special character string by another one. For example if the separator was a space, then spaces would be replaced by (eg) '/s', and '/' would be replaced by '//'. Another function, UnsafeifyFilename, would take that string and replace all the '/s' substrings by a space, and all the '//' substrings by '/'. The separator character and the '/' character could be some global or option somewhere, it doesn't really mattter. I'd suggest in code, but I imagine that regex is the nicest way to do this, and I'm not so good at regex. Isn't this the 'usual' way of handling this sort of problem? =Tony Meyer From mhammond at skippinet.com.au Thu Feb 13 14:49:55 2003 From: mhammond at skippinet.com.au (Mark Hammond) Date: Wed Feb 12 22:50:55 2003 Subject: [Spambayes] Outlook Plugin Crashing Outlook In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F1318D4B9@its-xchg4.massey.ac.nz> Message-ID: <016a01c2d312$f9447340$530f8490@eden> > Something (not spambayes!) crashed my system today and now > Outlook won't load with the plugin installed (but will > without). For some reason nothing appears in the trace on > launching Outlook (apart from the loading message), I can only suggest adding a few more print statement - particularly in "setupui" etc. It is probably that we are making a call which is crashing outlook. I've seen it before :( > execute manager.py, I get the following (along with the same > windows error dialog saying that errors have been caused and > the program will be closed). ... > IDispatch = pythoncom.CoCreateInstance(IDispatch, None, > clsctx, pythoncom.IID_IDispatch) > com_error: (-2146959355, 'Server execution failed', None, None) This exception generally just means "the server crashed". ie, Python is trying to start outlook, and outlook itself loads and uses Python, which calls Outlook, which crashes it. This exception is from the "outer-most" Python, which is in its own process, complaining it couldn't establish a connection to the outlook process it just started. Simple really :) > Any suggestions about what to do? Start outlook without the addin, reset the toolbars, and try again. There is something suspect in that toolbar code - I've already worked around a few other crashes in that area. Mark. -------------- next part -------------- A non-text attachment was scrubbed... Name: winmail.dat Type: application/ms-tnef Size: 2348 bytes Desc: not available Url : http://mail.python.org/pipermail/spambayes/attachments/20030213/b1781499/winmail-0001.bin From tim at fourstonesExpressions.com Wed Feb 12 22:11:19 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Wed Feb 12 23:11:31 2003 Subject: [Spambayes] ini file fumbling broke In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F1318CD53@its-xchg4.massey.ac.nz> Message-ID: <2ZSNWQA504KEVQVRB85Z04YWC3RMYX.3e4b1ae7@myst> 2/12/2003 9:33:36 PM, "Meyer, Tony" wrote: >> Someone recently decreed that all files mentioned in >> BAYESCUSTOMIZE must end >> in ".ini" and modified Options.py (I named my customize file >> ~/hammie.opt). >> Was this related to the embedded-spaces-in-paths problem? > >Yes. TimS used the magic of regex to 'fix' this problem. To be fair, it did fix that problem... ;) Guilty as charged. .ini seemed to be the agreed upon extension. If we can agree upon some set of extensions, the solution still works. If not, then we'll have to comma separate or something like that. The wind'ohs way of handling this is to double quote the filename if spaces are embedded. Doesn't matter to me, wasn't my itch in the first place... -TimS > >> If spaces are common in filenames, we need to pick a better >> separator. >[...] >> Thoughts? > >(I suggested this prior to the regex fix, but it became obsolete). > >What about tokenising the separator character? > >i.e. a function SafeifyFilename (with a better name, obviously) takes a file+path (string) and returns a string with the separator character (whatever it is) replaced by a special character string, and all instances of the special character string by another one. For example if the separator was a space, then spaces would be replaced by (eg) '/s', and '/' would be replaced by '//'. Another function, UnsafeifyFilename, would take that string and replace all the '/s' substrings by a space, and all the '//' substrings by '/'. The separator character and the '/' character could be some global or option somewhere, it doesn't really mattter. > >I'd suggest in code, but I imagine that regex is the nicest way to do this, and I'm not so good at regex. > >Isn't this the 'usual' way of handling this sort of problem? > >=Tony Meyer > >_______________________________________________ >Spambayes mailing list >Spambayes@python.org >http://mail.python.org/mailman/listinfo/spambayes > > c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org From tim at fourstonesExpressions.com Wed Feb 12 22:13:01 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Wed Feb 12 23:13:09 2003 Subject: [Spambayes] It gets funnier all the time.... In-Reply-To: <15947.4207.366335.199709@montanaro.dyndns.org> Message-ID: 2/12/2003 9:26:39 PM, Skip Montanaro wrote: > >> The performance penalty should be reasonable. One split() and length > >> check for each part in the common case. The try-to-fix function is > >> only called if it's long enough, which should be rare. > > Tim> I'd hate to see the code bloat with gimmicks that don't prove > Tim> themselves via testing > >I haven't checked anything in. People asked about decoding stuff that was >encoded but didn't have a Content-Transfer-Encoding header. I suggested the >diff I posted. That's as far as it's gone at this point. Apparently our test corpora didn't include any mail with this problem. Maybe it was an anti-selection thing, or maybe spam is evolving. I don't think anyone has proposed checking this in yet, let's see if it becomes a problem. If it does, we'll have the solution. - TimS > >Skip > > > c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org From frank.horowitz at csiro.au Thu Feb 13 12:36:35 2003 From: frank.horowitz at csiro.au (Frank Horowitz) Date: Wed Feb 12 23:36:43 2003 Subject: [Spambayes] Forged header? Message-ID: <1045110995.20731.18.camel@bonzo.ned.dem.csiro.au> Folks, It occurs to me that for a spammer to get past the entire filtering process, they simply need to include the header. Even if the classifier runs, it's still 50-50 whether the further downstream processing (e.g. procmail) matches the "real" header or the bogus one. While pop3proxy.py has a "remove any X-Spambayes-Classification headers in the incoming mail" item in the TODO list, is there some equivalent in hammie/outlook land? Frank From T.A.Meyer at massey.ac.nz Thu Feb 13 17:39:32 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Wed Feb 12 23:40:07 2003 Subject: [Spambayes] Forged header? Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1318D4BC@its-xchg4.massey.ac.nz> > It occurs to me that for a spammer to get past the entire filtering > process, they simply need to include the > header. > > Even if the classifier runs, it's still 50-50 whether the further > downstream processing (e.g. procmail) matches the "real" header or the > bogus one. While pop3proxy.py has a "remove any > X-Spambayes-Classification headers in the incoming mail" item in the > TODO list, is there some equivalent in hammie/outlook land? I don't know about hammie, but the Outlook plugin doesn't use the header. The plugin sets an Outlook user-property field to the spam 'probability'. A spammer couldn't get access to that without running code on the end-system, in which case there are more serious problems afoot! =Tony Meyer From T.A.Meyer at massey.ac.nz Thu Feb 13 17:39:50 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Wed Feb 12 23:40:25 2003 Subject: [Spambayes] Outlook Plugin Crashing Outlook Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1318CD54@its-xchg4.massey.ac.nz> > I can only suggest adding a few more print statement - > particularly in "setupui" etc. It is probably that we are > making a call which is crashing outlook. I've seen it before :( I was doing this, just hoping for a quicker fix. I eventually found it - it was *your* fault! ;) Nothing to do with my crash at all (just a coincidence that that was the first time I restarted Outlook since updating). The fix you uploaded to check for bsddb3 is what broke it. I auto-changed to use bsddb3, and that was what caused the problem. Setting use_db to False fixed it. I'll leave you to tell me what this means ;) > Start outlook without the addin, reset the toolbars, and try > again. There is something suspect in that toolbar code - > I've already worked around a few other crashes in that area. I did have to reset the toolbars for the buttons to reappear (the addin loaded, but no buttons appeared). Odd. The sooner those buttons are nice permanent ones, the better ;) Thanks, Tony Meyer From tim at fourstonesExpressions.com Wed Feb 12 22:43:52 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Wed Feb 12 23:44:01 2003 Subject: [Spambayes] Forged header? In-Reply-To: <1045110995.20731.18.camel@bonzo.ned.dem.csiro.au> Message-ID: 2/12/2003 10:36:35 PM, Frank Horowitz wrote: >Folks, > >It occurs to me that for a spammer to get past the entire filtering >process, they simply need to include the > header. > >Even if the classifier runs, it's still 50-50 whether the further >downstream processing (e.g. procmail) matches the "real" header or the >bogus one. While pop3proxy.py has a "remove any >X-Spambayes-Classification headers in the incoming mail" item in the >TODO list, is there some equivalent in hammie/outlook land? The tokenizer will ignore most of the headers in an email, including that one. This is not only for the reason you state, but also that they add no value to the classification. The classification is extremely accurate, and most all of the tweaking/twiddling/scheming around such things that was done during the research phase proved to either have no effect on the outcome, or to add expense to it in terms of performance and/or false positive/negative. What we are now watching closely is how spam will evolve. Certainly spammers will try to come up with schemes to defeat bayesian filtering. Let the real war commence! - TimS > > Frank > > > >_______________________________________________ >Spambayes mailing list >Spambayes@python.org >http://mail.python.org/mailman/listinfo/spambayes > > c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org From tim at fourstonesExpressions.com Wed Feb 12 22:45:18 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Wed Feb 12 23:45:26 2003 Subject: [Spambayes] Outlook Plugin Crashing Outlook In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F1318CD54@its-xchg4.massey.ac.nz> Message-ID: <72QMTNKFERPXT2X2XH64QP07RLB7ML.3e4b22de@myst> 2/12/2003 10:39:50 PM, "Meyer, Tony" wrote: >> I can only suggest adding a few more print statement - >> particularly in "setupui" etc. It is probably that we are >> making a call which is crashing outlook. I've seen it before :( >I was doing this, just hoping for a quicker fix. I eventually found it - it was *your* fault! ;) Nothing to do with my crash at all (just a coincidence that that was the first time I restarted Outlook since updating). > >The fix you uploaded to check for bsddb3 is what broke it. I auto-changed to use bsddb3, and that was what caused the problem. Setting use_db to False fixed it. I'll leave you to tell me what this means ;) use_db = False means that it uses a pickle database, rather than dbm. Personally, I still find pickle to be the better solution, but I'm a minority on that one... - TimS > >> Start outlook without the addin, reset the toolbars, and try >> again. There is something suspect in that toolbar code - >> I've already worked around a few other crashes in that area. >I did have to reset the toolbars for the buttons to reappear (the addin loaded, but no buttons appeared). Odd. The sooner those buttons are nice permanent ones, the better ;) > >Thanks, >Tony Meyer > >_______________________________________________ >Spambayes mailing list >Spambayes@python.org >http://mail.python.org/mailman/listinfo/spambayes > > c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org From T.A.Meyer at massey.ac.nz Thu Feb 13 17:47:28 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Wed Feb 12 23:48:03 2003 Subject: [Spambayes] Outlook Plugin Crashing Outlook Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1318D4BE@its-xchg4.massey.ac.nz> > use_db = False means that it uses a pickle database, rather > than dbm. That's what I figured. Does that mean that the "bsddb3 is definately not broken" comment is not correct? Or is it the fault of *my* bsddb3 install? > Personally, I still find pickle to be the better solution, > but I'm a minority on that one... - TimS :) Well, pickle works for me, so, at least the moment, I'm in your minority. =Tony Meyer From tim at fourstonesExpressions.com Wed Feb 12 22:51:36 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Wed Feb 12 23:51:47 2003 Subject: [Spambayes] Outlook Plugin Crashing Outlook In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F1318D4BE@its-xchg4.massey.ac.nz> Message-ID: 2/12/2003 10:47:28 PM, "Meyer, Tony" wrote: >> use_db = False means that it uses a pickle database, rather >> than dbm. > >That's what I figured. Does that mean that the "bsddb3 is definately not broken" comment is not correct? Or is it the fault of *my* bsddb3 install? Well, all I can say to that is this: There's been more angst about dbm implementations in this project than there has been relative to just about anything else. I hope they get it right in the next release of python. > >> Personally, I still find pickle to be the better solution, >> but I'm a minority on that one... - TimS > >:) Well, pickle works for me, so, at least the moment, I'm in your minority. > >=Tony Meyer > >_______________________________________________ >Spambayes mailing list >Spambayes@python.org >http://mail.python.org/mailman/listinfo/spambayes > > c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org From frank.horowitz at csiro.au Thu Feb 13 13:06:11 2003 From: frank.horowitz at csiro.au (Frank Horowitz) Date: Thu Feb 13 00:06:19 2003 Subject: [Spambayes] Forged header? In-Reply-To: References: Message-ID: <1045112770.21221.31.camel@bonzo.ned.dem.csiro.au> On Thu, 2003-02-13 at 12:43, Tim Stone - Four Stones Expressions wrote: > 2/12/2003 10:36:35 PM, Frank Horowitz wrote: > > >Folks, > > > >It occurs to me that for a spammer to get past the entire filtering > >process, they simply need to include the > > header. > > > >Even if the classifier runs, it's still 50-50 whether the further > >downstream processing (e.g. procmail) matches the "real" header or the > >bogus one. While pop3proxy.py has a "remove any > >X-Spambayes-Classification headers in the incoming mail" item in the > >TODO list, is there some equivalent in hammie/outlook land? > > The tokenizer will ignore most of the headers in an email, including that one. > This is not only for the reason you state, but also that they add no value to > the classification. The classification is extremely accurate, and most all of > the tweaking/twiddling/scheming around such things that was done during the > research phase proved to either have no effect on the outcome, or to add > expense to it in terms of performance and/or false positive/negative. Umm, that's not quite what I meant (perhaps I was unclear). I understand that the classifier does its job irrespective of any (potential) bogus headers. I also (now) understand from Tony Meyer's separate reply that the Outlook plugin is not vulnerable to the trivial spoofing that I suggested. Further, pop3proxy seems to have plans to incorporate a protection against such a spoof. I guess what my question now boils down to is whether or not hammiefilter *overwrites* any X-Spambayes-Classification header or merely "appends" such a header to a notional list of headers. If it's the former, all *should be* cool against this spoof. If it's the latter, hammiefilter is vulnerable. Not true??? > > What we are now watching closely is how spam will evolve. Certainly spammers > will try to come up with schemes to defeat bayesian filtering. Let the real > war commence! - TimS Agreed. And I was pointing out what I perceived to be a slight chink in the armor! Cheers, Frank From tim at fourstonesExpressions.com Wed Feb 12 23:13:28 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Thu Feb 13 00:13:40 2003 Subject: [Spambayes] Forged header? In-Reply-To: <1045112770.21221.31.camel@bonzo.ned.dem.csiro.au> Message-ID: 2/12/2003 11:06:11 PM, Frank Horowitz wrote: >On Thu, 2003-02-13 at 12:43, Tim Stone - Four Stones Expressions wrote: >> 2/12/2003 10:36:35 PM, Frank Horowitz wrote: >> >> >Folks, >> > >> >It occurs to me that for a spammer to get past the entire filtering >> >process, they simply need to include the >> > header. >> > >> >Even if the classifier runs, it's still 50-50 whether the further >> >downstream processing (e.g. procmail) matches the "real" header or the >> >bogus one. While pop3proxy.py has a "remove any >> >X-Spambayes-Classification headers in the incoming mail" item in the >> >TODO list, is there some equivalent in hammie/outlook land? >> >> The tokenizer will ignore most of the headers in an email, including that one. >> This is not only for the reason you state, but also that they add no value to >> the classification. The classification is extremely accurate, and most all of >> the tweaking/twiddling/scheming around such things that was done during the >> research phase proved to either have no effect on the outcome, or to add >> expense to it in terms of performance and/or false positive/negative. > >Umm, that's not quite what I meant (perhaps I was unclear). > >I understand that the classifier does its job irrespective of any >(potential) bogus headers. I also (now) understand from Tony Meyer's >separate reply that the Outlook plugin is not vulnerable to the trivial >spoofing that I suggested. Further, pop3proxy seems to have plans to >incorporate a protection against such a spoof. > >I guess what my question now boils down to is whether or not >hammiefilter *overwrites* any X-Spambayes-Classification header or >merely "appends" such a header to a notional list of headers. If it's >the former, all *should be* cool against this spoof. If it's the latter, >hammiefilter is vulnerable. Not true??? from hammie.py: def filter(self, msg, header=None, spam_cutoff=None, ham_cutoff=None, debugheader=None, debug=None, train=None): ..... if header == None: header = options.hammie_header_name ..... del msg[header] msg.add_header(header, disp) ..... > >> >> What we are now watching closely is how spam will evolve. Certainly spammers >> will try to come up with schemes to defeat bayesian filtering. Let the real >> war commence! - TimS > >Agreed. And I was pointing out what I perceived to be a slight chink in >the armor! > > Cheers, > Frank > > > >_______________________________________________ >Spambayes mailing list >Spambayes@python.org >http://mail.python.org/mailman/listinfo/spambayes > > c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org From frank.horowitz at csiro.au Thu Feb 13 13:43:25 2003 From: frank.horowitz at csiro.au (Frank Horowitz) Date: Thu Feb 13 00:43:37 2003 Subject: [Spambayes] Forged header? In-Reply-To: References: Message-ID: <1045115005.19817.39.camel@bonzo.ned.dem.csiro.au> On Thu, 2003-02-13 at 13:13, Tim Stone - Four Stones Expressions wrote: > 2/12/2003 11:06:11 PM, Frank Horowitz wrote: > > >I guess what my question now boils down to is whether or not > >hammiefilter *overwrites* any X-Spambayes-Classification header or > >merely "appends" such a header to a notional list of headers. If it's > >the former, all *should be* cool against this spoof. If it's the latter, > >hammiefilter is vulnerable. Not true??? > > from hammie.py: > > def filter(self, msg, header=None, spam_cutoff=None, > ham_cutoff=None, debugheader=None, > debug=None, train=None): > ..... > if header == None: > header = options.hammie_header_name > ..... > del msg[header] > msg.add_header(header, disp) > ..... OK, thanks. I should have "Used the source, Luke!" (but am not particularly familiar with where things are located yet). I'll shut up now ;-) Frank From T.A.Meyer at massey.ac.nz Thu Feb 13 19:32:59 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Thu Feb 13 01:33:37 2003 Subject: [Spambayes] ini file fumbling broke Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1318CD55@its-xchg4.massey.ac.nz> > If not, then > we'll have to comma separate or something like that. The > wind'ohs way of > handling this is to double quote the filename if spaces are > embedded. Doesn't matter to me, wasn't my itch in the first > place... -TimS It was my itch (because my .ini is now stored in the 'correct' place on a Win2k system...) So what about the spambayes/platform.py file that was talked about a while back gets implemented? This would include a filename-seperator string, presumably defaulting to ":" (covering *nix and OSX), with ";" for Win* - maybe those that have implemented spambayes on their Amiga can tell us what characters are illegal in filenames there... :) (Did the pre-X MacOS have any characters that were illegal in a filename apart from ":"? I can't recall. If not, then maybe "::" for these?) Anyway, this is really a job for a developer, since a new file has to be created and not much else. For the moment, even if platform-specific tests are spread throughout the code (like the stack size increment), it doesn't matter, they could be gradually moved to platform.py My 2c. =Tony Meyer From T.A.Meyer at massey.ac.nz Thu Feb 13 19:46:26 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Thu Feb 13 01:46:59 2003 Subject: [Spambayes] Spammie on Outlook 2000 - questions Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1318D4C2@its-xchg4.massey.ac.nz> > I created "Spam" and "Spam Maybe" folders and told spammie to > use those. Good so far. I also copied 77 deleted folder items > into Spam and trained on that vs. inbox. Still good. > Some of those messages might have been "in the > wrong place" messages (gotta LOVE OL's filtering moosh) and that's > semi-troubling for the future. Now, do I have to save the Spam folder > contents? Those messages are only necessary if you want to retrain from scratch. In normal operation, you shouldn't need to do this. However, since we're using alpha software, you might want to keep it so you can retrain when a new version breaks the old one, or mashes up your training data. > I found one message in Spam Maybe that ended up being good. I > did "Spam Recover" and got a rule error from OL about unable to > move message to (as it happens) python-list. Closed message box > and hit Spam recover again - no errors and message was moved > to inbox. Odd. I've tried to reproduce that error and can't. Even if I delete the original folder it finds it's way back to the inbox without an error. Make sure you let the list know if this is a regular occurance. > I notice OL now running even slower than it's previously > slothful self... It shouldn't be that much slower. The addin doesn't do a lot apart from when messages arrive and on startup/shutdown. The startup/shutdown times are much slower, but I don't notice any slowdown apart from that. Does anything in particular seem slow? =Tony Meyer From anthony at interlink.com.au Thu Feb 13 17:47:12 2003 From: anthony at interlink.com.au (Anthony Baxter) Date: Thu Feb 13 01:49:04 2003 Subject: [Spambayes] Forged header? In-Reply-To: <1045110995.20731.18.camel@bonzo.ned.dem.csiro.au> Message-ID: <200302130647.h1D6lCR13900@bonanza.off.ekorp.com> >>> Frank Horowitz wrote > It occurs to me that for a spammer to get past the entire filtering > process, they simply need to include the > header. Note that if you're using procmail, and you have your procmailrc set up something like: :0 fw:hamlock | /usr/local/bin/hammie.py -f -d :0: * ^X-Spambayes-Classification: spam | $RCVSTORE +spam :0: * ^X-Spambayes-Classification: unsure | $RCVSTORE +unsure ... other message handling ... Then the duplicate header won't matter worth a damn. procmail will still see the 'spam' header, and punt the message into the spam folder. From piersh at friskit.com Thu Feb 13 02:38:24 2003 From: piersh at friskit.com (Piers Haken) Date: Thu Feb 13 05:38:29 2003 Subject: [Spambayes] icons Message-ID: <9891913C5BFE87429D71E37F08210CB92C74FA@zeus.sfhq.friskit.com> Hey Mark, I just want to mention that the icons for the outlook plugin should have transparent backgrounds. As they are they show up strange if you're not using the standard grey on grey windows color scheme (on win2k). No biggie. But if we're getting ready to shipt it might be time to break out pbrush.exe ;-) Piers. From noreply at sourceforge.net Wed Feb 12 21:40:45 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Thu Feb 13 06:43:04 2003 Subject: [Spambayes] [ spambayes-Feature Requests-685746 ] Outlook plugin folder list sorted alphabetically Message-ID: Feature Requests item #685746, was opened at 2003-02-13 18:40 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498106&aid=685746&group_id=61702 Category: None Group: None Status: Open Priority: 5 Submitted By: Tony Meyer (anadelonbrin) Assigned to: Nobody/Anonymous (nobody) Summary: Outlook plugin folder list sorted alphabetically Initial Comment: It would be nice (but purely cosmetic) if the folder list that is generated by _BuildFoldersMAPI (in FolderSelector.py) was sorted alphabetically (within each folder). This is the view that Outlook provides in it's folder list, and it's a little confusing finding folders in a different order. If this is particularly non-trivial, then it might not be worth doing. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498106&aid=685746&group_id=61702 From noreply at sourceforge.net Wed Feb 12 21:40:43 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Thu Feb 13 06:43:07 2003 Subject: [Spambayes] [ spambayes-Feature Requests-685746 ] Outlook plugin folder list sorted alphabetically Message-ID: Feature Requests item #685746, was opened at 2003-02-13 18:40 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498106&aid=685746&group_id=61702 Category: None Group: None Status: Open Priority: 5 Submitted By: Tony Meyer (anadelonbrin) >Assigned to: Mark Hammond (mhammond) Summary: Outlook plugin folder list sorted alphabetically Initial Comment: It would be nice (but purely cosmetic) if the folder list that is generated by _BuildFoldersMAPI (in FolderSelector.py) was sorted alphabetically (within each folder). This is the view that Outlook provides in it's folder list, and it's a little confusing finding folders in a different order. If this is particularly non-trivial, then it might not be worth doing. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498106&aid=685746&group_id=61702 From skip at pobox.com Thu Feb 13 06:16:49 2003 From: skip at pobox.com (Skip Montanaro) Date: Thu Feb 13 07:16:56 2003 Subject: [Spambayes] It gets funnier all the time.... In-Reply-To: References: <15947.4207.366335.199709@montanaro.dyndns.org> Message-ID: <15947.36017.652722.917668@montanaro.dyndns.org> TimP> I'd hate to see the code bloat with gimmicks that don't prove TimP> themselves via testing Skip> People asked about decoding stuff that was encoded but didn't have Skip> a Content-Transfer-Encoding header. I suggested the diff I Skip> posted. That's as far as it's gone at this point. TimS> Apparently our test corpora didn't include any mail with this TimS> problem. Au contraire. Using my untouched-since-December ham/spam collections I ran a 10-fold cross-validation last night. The summary results are filename: base cte ham:spam: 2000:2000 2000:2000 fp total: 9 9 fp %: 0.45 0.45 fn total: 17 14 fn %: 0.85 0.70 unsure t: 94 100 unsure %: 2.35 2.50 real cost: $125.80 $124.00 best cost: $76.20 $77.60 h mean: 1.50 1.56 h sdev: 9.59 9.80 s mean: 98.03 98.14 s sdev: 10.91 10.62 mean diff: 96.53 96.58 k: 4.71 4.73 "base" is an empty ini file. "cte" is [Tokenizer] assume_missing_cte: True so in this case at least the false negatives got slightly better and the unsures a bit worse. I suspect this is typical of what we'll see with most changes at this stage of the game - somewhat inconclusive results. Whether or not to add it is going to be a judgement call. A patch which implements this change is attached for anyone who wants to run the test. Skip -------------- next part -------------- A non-text attachment was scrubbed... Name: sb.diff Type: application/octet-stream Size: 2078 bytes Desc: not available Url : http://mail.python.org/pipermail/spambayes/attachments/20030213/2dfdb1d1/sb.obj From tim at fourstonesExpressions.com Thu Feb 13 07:48:21 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Thu Feb 13 08:48:31 2003 Subject: [Spambayes] It gets funnier all the time.... In-Reply-To: <15947.36017.652722.917668@montanaro.dyndns.org> Message-ID: <93RO443WRVP3YIEJE62STQRMFXU3V.3e4ba225@myst> 2/13/2003 6:16:49 AM, Skip Montanaro wrote: > TimP> I'd hate to see the code bloat with gimmicks that don't prove > TimP> themselves via testing I think the point here is that we're using the email module, which makes quite a few assumptions about the "well-formedness" of the mail. When spammers figure this out, they'll unleash a whole lot of crap that most mailers will display, but is so badly formed that bayesian filters can't find enough about it to place it in the spam category. That's all they're after. They don't care if it's classified as ham or unsure, just NOT spam. So... it behooves us to begin to think like a spammer: How can I break this thing? They'll be looking for all the tricks. Let's find 'em first. > > Skip> People asked about decoding stuff that was encoded but didn't have > Skip> a Content-Transfer-Encoding header. I suggested the diff I > Skip> posted. That's as far as it's gone at this point. > > TimS> Apparently our test corpora didn't include any mail with this > TimS> problem. > >Au contraire. Using my untouched-since-December ham/spam collections I ran >a 10-fold cross-validation last night. The summary results are > > filename: base cte > ham:spam: 2000:2000 > 2000:2000 > fp total: 9 9 > fp %: 0.45 0.45 > fn total: 17 14 > fn %: 0.85 0.70 > unsure t: 94 100 > unsure %: 2.35 2.50 > real cost: $125.80 $124.00 > best cost: $76.20 $77.60 > h mean: 1.50 1.56 > h sdev: 9.59 9.80 > s mean: 98.03 98.14 > s sdev: 10.91 10.62 > mean diff: 96.53 96.58 > k: 4.71 4.73 > >"base" is an empty ini file. "cte" is > > [Tokenizer] > assume_missing_cte: True > >so in this case at least the false negatives got slightly better and the >unsures a bit worse. I suspect this is typical of what we'll see with most >changes at this stage of the game - somewhat inconclusive results. Whether >or not to add it is going to be a judgement call. > >A patch which implements this change is attached for anyone who wants to run >the test. > >Skip > > c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org From neale at woozle.org Thu Feb 13 08:27:36 2003 From: neale at woozle.org (Neale Pickett) Date: Thu Feb 13 11:28:06 2003 Subject: [Spambayes] spampot and black holes In-Reply-To: <82ID05E0DEBMKKFA1TMG1W5484A9ML.3e4ac30e@myst> (Tim Stone - Four Stones Expressions's message of "Wed, 12 Feb 2003 15:56:30 -0600") References: <82ID05E0DEBMKKFA1TMG1W5484A9ML.3e4ac30e@myst> Message-ID: Tim Stone - Four Stones Expressions writes: > I'm wondering if it might not be a reasonable idea to run some very easy to > find spampots, which (of course) don't actually relay the spam. Might make > life sucky (at least temporarily) for some spammer somewhere... We could even > make it so it's particularly slow in response time, so their spam activity > takes forever. We could even return an smtp error response at random that > says something like "We're out here, and we will stop you." Well, spampot has been fun to run, but I can't say that it's really catching a lot of spam. I mean, it is getting a lot, but in droves of like 20,000 copies of just one message. Matt Sargeant's hunch appears to have been correct so far. Maybe if someone had the bandwidth to host one on a really fast link, they'd get different results. I don't know what these open-relay probes are probing for, exactly. They may do bandwidth testing and round-trip time. (Note that my spampot is currently off-line while I move to a new apt.) Neale From tim at fourstonesExpressions.com Thu Feb 13 10:31:14 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Thu Feb 13 11:31:26 2003 Subject: [Spambayes] spampot and black holes In-Reply-To: Message-ID: 2/13/2003 10:27:36 AM, Neale Pickett wrote: >Tim Stone - Four Stones Expressions writes: > >> I'm wondering if it might not be a reasonable idea to run some very easy to >> find spampots, which (of course) don't actually relay the spam. Might make >> life sucky (at least temporarily) for some spammer somewhere... We could even >> make it so it's particularly slow in response time, so their spam activity >> takes forever. We could even return an smtp error response at random that >> says something like "We're out here, and we will stop you." > >Well, spampot has been fun to run, but I can't say that it's really >catching a lot of spam. I mean, it is getting a lot, but in droves of >like 20,000 copies of just one message. Matt Sargeant's hunch appears >to have been correct so far. > >Maybe if someone had the bandwidth to host one on a really fast link, >they'd get different results. I don't know what these open-relay probes >are probing for, exactly. They may do bandwidth testing and round-trip >time. I've got some decent bandwidth, but I don't normally keep a linux box up all the time. Maybe I'll have to change that. I tried to mod spampot to run on wind'ohs, but keep getting a socket error 10049, can't assign address. Completely cryptic, no info on what causes it anywhere... typical winduhz. - TimS > >(Note that my spampot is currently off-line while I move to a new apt.) > >Neale > > >_______________________________________________ >Spambayes mailing list >Spambayes@python.org >http://mail.python.org/mailman/listinfo/spambayes > > c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org From popiel at wolfskeep.com Thu Feb 13 09:00:51 2003 From: popiel at wolfskeep.com (T. Alexander Popiel) Date: Thu Feb 13 12:00:55 2003 Subject: [Spambayes] ini file fumbling broke In-Reply-To: Message from "Meyer, Tony" <1ED4ECF91CDED24C8D012BCF2B034F1318CD55@its-xchg4.massey.ac.nz> References: <1ED4ECF91CDED24C8D012BCF2B034F1318CD55@its-xchg4.massey.ac.nz> Message-ID: <20030213170051.E25832DED0@cashew.wolfskeep.com> In message: <1ED4ECF91CDED24C8D012BCF2B034F1318CD55@its-xchg4.massey.ac.nz> "Meyer, Tony" writes: >This would include a filename-seperator string, presumably defaulting to = >":" (covering *nix and OSX), with ";" for Win* - maybe those that have = >implemented spambayes on their Amiga can tell us what characters are = >illegal in filenames there... :) Just to throw another monkey wrench in to the mix, I'll point out that ':' is legal in *nix filenames, too. People just tend not to use it because it would break too much stuff. The gulf between 'can' and 'should' is huge. ;-) On the other hand, I support the idea of using : and ; for file list separators, as suggested above. - Alex From whisper at oz.net Thu Feb 13 10:57:45 2003 From: whisper at oz.net (David LeBlanc) Date: Thu Feb 13 13:57:48 2003 Subject: [Spambayes] Spammie on Outlook 2000 - questions In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F1318D4C2@its-xchg4.massey.ac.nz> Message-ID: > > I found one message in Spam Maybe that ended up being good. I > > did "Spam Recover" and got a rule error from OL about unable to > > move message to (as it happens) python-list. Closed message box > > and hit Spam recover again - no errors and message was moved > > to inbox. > > Odd. I've tried to reproduce that error and can't. Even if I > delete the original folder it finds it's way back to the inbox > without an error. Make sure you let the list know if this is a > regular occurance. > Consider this notice: frequent rule errors upon mail fetch. These are of the form "couldn't move message to " and there are a couple of folders involved. I am also getting _many_ python-list messages (31 just now) into Spam Maybe and some (17) just stayed in the inbox - generally most python-list messages go to the python-list folder. Looking over my Spam corpus (copied from deleted messages folder), I noticed that there are a lot of otherwise legit messages that just got deleted. I've removed those and retrained, building the database from scratch. We'll see if that makes any difference. Currently, I am getting few spam and many spam maybe - is this characteristic of a young spammie? > =Tony Meyer > Dave LeBlanc Seattle, WA USA From neale at woozle.org Thu Feb 13 11:02:16 2003 From: neale at woozle.org (Neale Pickett) Date: Thu Feb 13 14:02:48 2003 Subject: [Spambayes] spampot and black holes In-Reply-To: (Tim Stone - Four Stones Expressions's message of "Thu, 13 Feb 2003 10:31:14 -0600") References: Message-ID: Tim Stone - Four Stones Expressions writes: > I've got some decent bandwidth Anyone who wants to run spampot, please feel free to grab the code at . You'll need spampot.py and maildir.py. Well, and Python :) Neale From nick_holden at ntlworld.com Thu Feb 13 20:21:37 2003 From: nick_holden at ntlworld.com (Nick Holden) Date: Thu Feb 13 15:50:43 2003 Subject: [Spambayes] Any prospect of spambayes working with qmail? Message-ID: <1045167697.5666.764.camel@adamselene.home> I'm intrigued by the spambayes project, and wondered whether you'd put any thought to apps to fit spambayes into the sendmail / qmail end of the mail process rather than as an end-user app? Running a mail server that predominantly forwards email for various non-profit groups who advertise their email addresses widely on the web, we're drowning in spam and viruses which I want to cut out at the mail server, rather than advising them to insist that their various ISPs install end-user spam filtering. But while there are lots of qmail or sendmail virus filters, there are very few spam filters that can be called by qmail. Spam Assassin will work with qmail-scanner, apparently, but qmail-scanner won't work with vpopmail virtual domains, so that route doesn't help me much either. Your thoughts would be appreciated, Nick From noreply at sourceforge.net Thu Feb 13 13:02:36 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Thu Feb 13 16:04:28 2003 Subject: [Spambayes] [ spambayes-Patches-639122 ] hammie: ignore emails older than n days Message-ID: Patches item #639122, was opened at 2002-11-15 15:47 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498105&aid=639122&group_id=61702 Category: None Group: None Status: Open Resolution: Later Priority: 5 Submitted By: Jason Hildebrand (jdhildeb) Assigned to: Neale Pickett (npickett) Summary: hammie: ignore emails older than n days Initial Comment: Since your documentation stresses the importance of training using only relatively recent emails, I thought a good way to do this would be to have hammie do it for me. So I added a new configuration option: [Hammie] # when training, hammie will ignore messages older than this number of days. # i.e. set to 365 to ignore messages older than one year. # Set to 0 to disable any filtering by date. ignore_old_messages: 0 The patch also modifies Hammie to output the number of messages it read/ignored for each mail file it processes. This option might also prove useful for doing incremental training (i.e. set up cron to train once a week, and set ignore_old_messages to 7). ---------------------------------------------------------------------- >Comment By: Jason Hildebrand (jdhildeb) Date: 2003-02-13 15:02 Message: Logged In: YES user_id=173690 I just updated to the latest spambayes release (1.0a2) and took a look at mboxtrain.py. It has the ability to remember which messages it has already seen. I would still like to have the 'ignore_old_messages' feature, though, for the initial training. I have lots of folders which contain lots of old messages (years old) and a few new ones, and I think I get better results if the really old messages in each folder are ignored. ---------------------------------------------------------------------- Comment By: Jason Hildebrand (jdhildeb) Date: 2003-01-24 16:29 Message: Logged In: YES user_id=173690 Unfortunately, I haven't had time to update to a more recent spambayes; I'm still using a version from last november. Since this version is working well for me, I'm not terribly interested in messing with it -- since I know things have changed considerably in CVS since then. So I'm in a poor position to judge whether the functionality mboxtrain.py offers is "good enough" -- I'll have to leave it up to others to comment on. ---------------------------------------------------------------------- Comment By: T. Alexander Popiel (popiel) Date: 2003-01-23 12:42 Message: Logged In: YES user_id=632302 Parsing the topmost received header for the date is a very valuable tool for maintaining limited database size. It's a key feature of my bulkgraph.py script (over and above dealing with my non-standard everything vs. spam folders). Count this as another vote to include such filtering... even though my peculiar folder setup precludes me from using mboxtrain. ---------------------------------------------------------------------- Comment By: Neale Pickett (npickett) Date: 2003-01-22 23:01 Message: Logged In: YES user_id=619391 Jason, does the current mboxtrain.py script do enough of this functionality for you, or would you still like to see us work by the Recieved header? I suspect it might be good enough... ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498105&aid=639122&group_id=61702 From python-spambayes at discworld.dyndns.org Thu Feb 13 15:10:48 2003 From: python-spambayes at discworld.dyndns.org (Charles Cazabon) Date: Thu Feb 13 16:08:19 2003 Subject: [Spambayes] Any prospect of spambayes working with qmail? In-Reply-To: <1045167697.5666.764.camel@adamselene.home>; from nick_holden@ntlworld.com on Thu, Feb 13, 2003 at 08:21:37PM +0000 References: <1045167697.5666.764.camel@adamselene.home> Message-ID: <20030213151048.B8711@discworld.dyndns.org> Nick Holden wrote: > > But while there are lots of qmail or sendmail virus filters, there are > very few spam filters that can be called by qmail. Actually, just about any kind of filtering is trivially integrated into qmail as a qmail-queue wrapper. This is such a common approach that Bruce Guenter wrote the QMAILQUEUE patch to make it modular and flexible, and his qmail-qfilter package makes writing your filters and chaining them together child's play. > Spam Assassin will work with qmail-scanner, apparently, but qmail-scanner > won't work with vpopmail virtual domains, so that route doesn't help me much > either. vpopmail is best avoided. But we're dangerously offtopic. Charles -- ----------------------------------------------------------------------- Charles Cazabon GPL'ed software available at: http://www.qcc.ca/~charlesc/software/ ----------------------------------------------------------------------- From tim at fourstonesExpressions.com Thu Feb 13 15:14:28 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Thu Feb 13 16:14:39 2003 Subject: [Spambayes] Any prospect of spambayes working with qmail? In-Reply-To: <1045167697.5666.764.camel@adamselene.home> Message-ID: <0CAXUON08BA52TPJE1T98TN07ZW6ZUR.3e4c0ab4@myst> 2/13/2003 2:21:37 PM, Nick Holden wrote: >I'm intrigued by the spambayes project, and wondered whether you'd put >any thought to apps to fit spambayes into the sendmail / qmail end of >the mail process rather than as an end-user app? The problem with a server side spam filter is that it is completely dependent on finding a common definition of spam. One man's spam is another man's subscribed mail. The spambayes team chose to deliver a client side tool so that individuals could enforce their own spam definition, not have that definition imposed upon them by the set of smtp servers (and server admins) that a particular piece of mail happened to be routed through. While spambayes is no doubt integratable with server side processes, we've done little work to make it so. If you search back through the archives, you can find some work done with mailman integration, and lots of discussion on your particular topic, which will further explain what I've said here. - TimS > >Running a mail server that predominantly forwards email for various >non-profit groups who advertise their email addresses widely on the web, >we're drowning in spam and viruses which I want to cut out at the mail >server, rather than advising them to insist that their various ISPs >install end-user spam filtering. > >But while there are lots of qmail or sendmail virus filters, there are >very few spam filters that can be called by qmail. Spam Assassin will >work with qmail-scanner, apparently, but qmail-scanner won't work with >vpopmail virtual domains, so that route doesn't help me much either. > >Your thoughts would be appreciated, > >Nick > > >_______________________________________________ >Spambayes mailing list >Spambayes@python.org >http://mail.python.org/mailman/listinfo/spambayes > > c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org From richie at entrian.com Thu Feb 13 21:15:37 2003 From: richie at entrian.com (Richie Hindle) Date: Thu Feb 13 16:16:50 2003 Subject: [Spambayes] pop3proxy: what are (none) messages? In-Reply-To: References: Message-ID: Hi Jon, > So I've got pop3proxy working with Eudora beautifully. Thanks again > everyone for helping me out with that. The problem I'm seeing now > relates to the review process. When I go to train messages, sometimes > a message will show up with "(none)" for a subject as well as for a > recipient. That means that either there's no Subject/To/etc header, or the email was broken in some way so the software couldn't find the relevant header. If you find a message that looks valid but shows up with "(none)", please send it to me (click on the subject to view the message, look for the corresponding file in the pop3proxy-xxx-cache directory, and zip up the file - that guarantees that nothing mangles the message in transit). Thanks, -- Richie Hindle richie@entrian.com From richie at entrian.com Thu Feb 13 21:19:15 2003 From: richie at entrian.com (Richie Hindle) Date: Thu Feb 13 16:20:27 2003 Subject: [Spambayes] ini file fumbling broke In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F1318CD55@its-xchg4.massey.ac.nz> References: <1ED4ECF91CDED24C8D012BCF2B034F1318CD55@its-xchg4.massey.ac.nz> Message-ID: <6r2o4vsglnthko16902tsg9qsiau2gj903@4ax.com> > This would include a filename-seperator string, presumably defaulting > to ":" (covering *nix and OSX), with ";" for Win* It sounds like "os.pathsep" is what you're looking for. -- Richie Hindle richie@entrian.com From tim at fourstonesExpressions.com Thu Feb 13 15:24:06 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Thu Feb 13 16:24:16 2003 Subject: [Spambayes] ini file fumbling broke In-Reply-To: <6r2o4vsglnthko16902tsg9qsiau2gj903@4ax.com> Message-ID: 2/13/2003 3:19:15 PM, Richie Hindle wrote: > >> This would include a filename-seperator string, presumably defaulting >> to ":" (covering *nix and OSX), with ";" for Win* > >It sounds like "os.pathsep" is what you're looking for. Nope. Pathsep is forwardslash or backslash. We can't delimit multiple filenames on that... ;) - TimS > >-- >Richie Hindle >richie@entrian.com > > >_______________________________________________ >Spambayes mailing list >Spambayes@python.org >http://mail.python.org/mailman/listinfo/spambayes > > c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org From popiel at wolfskeep.com Thu Feb 13 13:31:22 2003 From: popiel at wolfskeep.com (T. Alexander Popiel) Date: Thu Feb 13 16:31:25 2003 Subject: [Spambayes] pop3proxy: what are (none) messages? In-Reply-To: Message from Richie Hindle References: Message-ID: <20030213213122.B74DE2DED0@cashew.wolfskeep.com> In message: Richie Hindle writes: > >That means that either there's no Subject/To/etc header, or the email was >broken in some way so the software couldn't find the relevant header. One thing that I noticed recently going through mboxutils.py is that if a message fails to parse for some reason, spambayes then strips off _ALL_ headers and tries again. This could obviously be quite damaging... One mail that I had such troubles with was a MIME-encoded message built with mime-construct, without any body text before the first MIME section (the text that usually says "This is a mime-encoded message. If you're reading this, please upgrade to a vulnerable mailreader that we can subvert."... or something along those lines). Unfortunately, I no longer have a pristine copy of that mail... spambayes chomped it. If it's still an open question when I finish with my graph stuff, then I may try to regenerate similarly confusing messages... - Alex From richie at entrian.com Thu Feb 13 21:31:18 2003 From: richie at entrian.com (Richie Hindle) Date: Thu Feb 13 16:32:31 2003 Subject: [Spambayes] Alpha2 Pre-release In-Reply-To: <41598437.1044974810@[10.0.18.7]> References: <41598437.1044974810@[10.0.18.7]> Message-ID: <573o4v477f4fvc93cue14ohn3l2qj54brj@4ax.com> [Eric] > error in spambayes setup command: invalid distribution option 'classifiers' This looks as though the 2.3 distutils doesn't understand 'classifiers'... setup.py says: # patch distutils if it can't cope with the "classifiers" keyword. # this just makes it ignore it. if sys.version < '2.2.3': from distutils.dist import DistributionMetadata DistributionMetadata.classifiers = None so is this wrong? I'm no expert with distutils, but I can't see at a quick glance that it knows about 'classifiers': >>> import sys >>> sys.version '2.3a1 (#38, Dec 31 2002, 17:53:59) [MSC v.1200 32 bit (Intel)]' >>> from distutils.dist import DistributionMetadata >>> DistributionMetadata.classifiers Traceback (most recent call last): File "", line 1, in ? DistributionMetadata.classifiers AttributeError: class DistributionMetadata has no attribute 'classifiers' and: C:\Python23\Lib\distutils>grep -i classifiers *.py C:\Python23\Lib\distutils>cd command C:\Python23\Lib\distutils\command>grep -i classifiers *.py C:\Python23\Lib\distutils\command> Maybe our setup code should say something like [untested]: # patch distutils if it can't cope with the "classifiers" keyword. # this just makes it ignore it. from distutils.dist import DistributionMetadata if not hasattr(DistributionMetadata, 'classifiers'): DistributionMetadata.classifiers = None What do people think? Is there a distutils expert in the house? > Any insight would be appreciated You can work around the problem by removing the "classifiers = [ ... ]" piece of setup.py. Hope that helps, -- Richie Hindle richie@entrian.com From skip at pobox.com Thu Feb 13 15:33:21 2003 From: skip at pobox.com (Skip Montanaro) Date: Thu Feb 13 16:33:45 2003 Subject: [Spambayes] ini file fumbling broke In-Reply-To: <6r2o4vsglnthko16902tsg9qsiau2gj903@4ax.com> References: <1ED4ECF91CDED24C8D012BCF2B034F1318CD55@its-xchg4.massey.ac.nz> <6r2o4vsglnthko16902tsg9qsiau2gj903@4ax.com> Message-ID: <15948.3873.206619.827333@montanaro.dyndns.org> >> This would include a filename-seperator string, presumably defaulting >> to ":" (covering *nix and OSX), with ";" for Win* Richie> It sounds like "os.pathsep" is what you're looking for. Yes it does. os.sep and os.pathsep rarely occur to me because neither is actually in os.path. I'll check in a fix so everyone can scream at me... ;-) Skip From popiel at wolfskeep.com Thu Feb 13 13:09:21 2003 From: popiel at wolfskeep.com (T. Alexander Popiel) Date: Thu Feb 13 16:34:49 2003 Subject: [Spambayes] Any prospect of spambayes working with qmail? In-Reply-To: Message from Nick Holden of "13 Feb 2003 20:21:37 GMT." <1045167697.5666.764.camel@adamselene.home> References: <1045167697.5666.764.camel@adamselene.home> Message-ID: <20030213210921.E5FD82DED0@cashew.wolfskeep.com> In message: <1045167697.5666.764.camel@adamselene.home> Nick Holden writes: >I'm intrigued by the spambayes project, and wondered whether you'd put >any thought to apps to fit spambayes into the sendmail / qmail end of >the mail process rather than as an end-user app? We've discussed similar from time to time, and I think that the general consensus is that spambayes is poorly suited for such use. The basic problem is one of defining spam. In spambayes, the definition that we use is 'spam is whatever the user says is spam'. This requires direct user involvement in the training process, for obvious reason. Unfortunately, trying to incorporate spambayes into an MTA upstream of the users presents three problems: 1. An upstream MTA is likely to have multiple users. If the users disagree on what constitutes spam, then the training will be inconsistent and spambayes will get confused. 2. An upstream MTA is likely to not have any good facilities for users to train through. About the best that could be hoped for would be a web interface similar to pop3proxy... but that's rather far outside the scope of an MTA and is unlikely to be set up in a way that the users could access it and the spammers couldn't. 3. Most of us want to be able to peruse all our mail, even that marked as spam, looking for false positives (which frequently occur for a short while after starting a new business relationship with some vendor on the web). If an upstream MTA is tossing stuff spambayes thinks is spam, then users cannot do this perusal. If the upstream MTA is merely marking the email without tossing it, then the MTA hasn't gained anything, but has instead spent a bunch of cycles on a judgement that could be forged by a spammer. In the end, spambayes makes for a very poor tool to be used by paternalistic ISPs, precisely because of the user-centric spam definition that makes it a wonderful tool for individual user installations. - Alex From richie at entrian.com Thu Feb 13 21:35:06 2003 From: richie at entrian.com (Richie Hindle) Date: Thu Feb 13 16:36:20 2003 Subject: [Spambayes] ini file fumbling broke In-Reply-To: References: <6r2o4vsglnthko16902tsg9qsiau2gj903@4ax.com> Message-ID: [Richie] > It sounds like "os.pathsep" is what you're looking for. [Tim] > Nope. Pathsep is forwardslash or backslash. We can't delimit multiple > filenames on that... ;) - TimS Er: >>> sys.platform 'win32' >>> os.pathsep ';' and from the documentation: pathsep The character conventionally used by the operating system to separate search patch components (as in PATH), such as ":" for POSIX or ";" for Windows. -- Richie Hindle richie@entrian.com From nick_holden at ntlworld.com Thu Feb 13 21:28:38 2003 From: nick_holden at ntlworld.com (Nick Holden) Date: Thu Feb 13 16:38:52 2003 Subject: [Spambayes] Any prospect of spambayes working with qmail? In-Reply-To: <0CAXUON08BA52TPJE1T98TN07ZW6ZUR.3e4c0ab4@myst> References: <0CAXUON08BA52TPJE1T98TN07ZW6ZUR.3e4c0ab4@myst> Message-ID: <1045171718.5666.784.camel@adamselene.home> I appreciate the rapid responses! I understand the arguments that you've put forward as to why a spam filter should, ordinarily, be a client-side application. I think I'm trying to set up a very particular situation, which might explain why I want to do something different. I'm setting up a mail domain for a non-profit organisation, which advertises a whole host of email addresses for contact points through the web and email: london@domain, mexico@domain and so on. None of these are pop3 clients - they all forward on to people who have agreed to receive contacts from people interested in the issue on which this particular organisation campaigns. Currently, they are receiving three or fourmore more spams for each genuine contact. This is clearly unacceptable, and I don't particularly want to be forwarding them this rubbish, nor saying that it's their responsibility to weed it out. I know they haven't asked for it, so I want to drop it on my server before the mail gets forwarded. I'm not arguing with your general assumptions, and I appreciate that they're shared by almost all the anti-spam project groups I've come across. I think my solution will end up being some adaptation of amavis, calling a spam filter, but I'm still trying to find a Bayesian filter that will work with amavis: so far Spam Assassin is all I get told to use! Best regards, and thanks for the feedback. Nick From tim.one at comcast.net Thu Feb 13 16:37:13 2003 From: tim.one at comcast.net (Tim Peters) Date: Thu Feb 13 16:39:07 2003 Subject: [Spambayes] ini file fumbling broke In-Reply-To: Message-ID: >>> This would include a filename-seperator string, presumably defaulting >>> to ":" (covering *nix and OSX), with ";" for Win* >> It sounds like "os.pathsep" is what you're looking for. > Nope. Pathsep is forwardslash or backslash. We can't delimit multiple > filenames on that... ;) - TimS Nope. Here on Windows: >>> import os >>> os.pathsep ';' >>> os.sep '\\' >>> pathsep is named in honor of its use in PATH-like expressions, which contain multiple directory paths. From richie at entrian.com Thu Feb 13 21:39:16 2003 From: richie at entrian.com (Richie Hindle) Date: Thu Feb 13 16:40:35 2003 Subject: [Spambayes] pop3proxy: what are (none) messages? In-Reply-To: <20030213213122.B74DE2DED0@cashew.wolfskeep.com> References: of "Thu, 13 Feb 2003 21:15:37 GMT." <20030213213122.B74DE2DED0@cashew.wolfskeep.com> Message-ID: [Alex] > One thing that I noticed recently going through mboxutils.py is that > if a message fails to parse for some reason, spambayes then strips off > _ALL_ headers and tries again. This could obviously be quite damaging... That's very bad if it can ever happen to 'live' email. The POP3 proxy will never do this - when modifying messages it uses a deliberately naive 'parser' (which is "so simple that it obviously has no flaws") that will either work perfectly or fail cleanly (notwithstanding protocol errors, with apologies to Fran?ois 8-). This business of failing to parse headers only affects the way messages are displayed on the Review page. -- Richie Hindle richie@entrian.com From mhammond at skippinet.com.au Fri Feb 14 08:59:11 2003 From: mhammond at skippinet.com.au (Mark Hammond) Date: Thu Feb 13 17:00:12 2003 Subject: [Spambayes] Spammie on Outlook 2000 - questions In-Reply-To: Message-ID: <005101c2d3ab$24a5a8e0$530f8490@eden> > > > > I found one message in Spam Maybe that ended up being good. I > > > did "Spam Recover" and got a rule error from OL about unable to > > > move message to (as it happens) python-list. Closed message box > > > and hit Spam recover again - no errors and message was moved > > > to inbox. > > > > Odd. I've tried to reproduce that error and can't. Even if I > > delete the original folder it finds it's way back to the inbox > > without an error. Make sure you let the list know if this is a > > regular occurance. > > > Consider this notice: frequent rule errors upon mail fetch. Can you clarify exactly what error? This is the Outlook rule engine displaying the error in a message box, not spambayes? > These are of the > form "couldn't move message to " and there are a > couple of folders > involved. I have never seen this, but *do* see a number of messages fail to filter. In general, this happens as Outlook is shutting down or starting up, but occasionally it seems to happen for no good reason. A "problem" is that I recently reconfigured my Outlook mail for "corporate" mode (rather than "Internet Only"). It is possible to do this even if you only use internet mail. Corporate mode changes outlook *radically*. One large change is that the mail spooler now runs in its own process (mapisp32.exe), and Outlook spools from that. If you kill Outlook but leave mapisp32 running, you often end up with messages in your inbox that the Outlook rule engine missed (as did spambayes etc - but we catch what we missed next startup). In "corporate" mode, "Send and Receive" works completely differerently (no way to see the total number of messages you are pulling down) and all the "Preferences" dialogs change. If anyone wishes to see this for themselves, simply hit the "reconfigure mail support" button in your "Accounts" setup. You don't lose your accounts or mail, and you can re-configure later - but stuff changes all over the place. So, I see a little rule strangeness too, but I am not sure how much of this is related to spambayes - I know not all of it is. > I am also getting _many_ python-list messages (31 > just now) into > Spam Maybe and some (17) just stayed in the inbox - generally most > python-list messages go to the python-list folder. The occasion I have seen this (Outlook failing to filter *any* message), I had to manually kill Outlook and this new mapisp32.exe. Then it came good. Oh, I forgot to mention that a client insists I run the Outlook PGP plugin, and this often causes that error too - it appears to have a memory leak, and Outlook displays an "Out of resources" dialog, before all filters stop working (and you can no longer even read mails, etc) My point is that Outlook is complex, and extensible, so your problem could possibly lie in a few places. From what I can tell, most people have generally successful Outlook rules working alongside spambayes. > Looking over my Spam corpus (copied from deleted messages > folder), I noticed > that there are a lot of otherwise legit messages that just > got deleted. I'm not sure what you mean here. You created the spam corpus by hand from your "deleted items" folder, so I am not sure what you mean by "I noticed that there are a lot of otherwise legit messages that just got deleted." Deleted by whom? Certainly if you trained spambayes incorrectly, it will give incorrect results. > Currently, I am getting few spam and many spam maybe - is this > characteristic of a young spammie? Not really, but possibly characteristic of poor training. This really is spambayes' biggest drawback IMO - not knowing when it is trained well enough to do a good job. Well, to be fair, it is too much to expect spambayes to know this, when the people behind it don't even know the answer to that yet Anecdotally, it seems people get good results with a fairly balanced, well filtered corpa. A huge imbalance in spam:ham ratios may behave strangely, even though there is some special weighting in that case. And don't forget about "Filter Now" - you can pick random folders, and have spambayes score them *without* doing a filter. This can help you see how well you existing mail would have scored with your new training data. There is no reason you can not "filter now" your "deleted items" to see how it would go. Particularly, do refilter your "ham" and "spam" folders, checking that you haven't accidently misclassified any messages. Hope this helps. Let us know if it doesn't. Mark. From skip at pobox.com Thu Feb 13 16:01:15 2003 From: skip at pobox.com (Skip Montanaro) Date: Thu Feb 13 17:01:33 2003 Subject: [Spambayes] Any prospect of spambayes working with qmail? In-Reply-To: <1045171718.5666.784.camel@adamselene.home> References: <0CAXUON08BA52TPJE1T98TN07ZW6ZUR.3e4c0ab4@myst> <1045171718.5666.784.camel@adamselene.home> Message-ID: <15948.5547.664621.961364@montanaro.dyndns.org> Nick> I'm setting up a mail domain for a non-profit organisation, which Nick> advertises a whole host of email addresses for contact points Nick> through the web and email: london@domain, mexico@domain and so Nick> on. None of these are pop3 clients - they all forward on to people Nick> who have agreed to receive contacts from people interested in the Nick> issue on which this particular organisation campaigns. Currently, Nick> they are receiving three or fourmore more spams for each genuine Nick> contact. This is clearly unacceptable, and I don't particularly Nick> want to be forwarding them this rubbish, nor saying that it's Nick> their responsibility to weed it out. I know they haven't asked for Nick> it, so I want to drop it on my server before the mail gets Nick> forwarded. It seems you have a more homogeneous user population than your typical ISP. You can probably get away with something like this: 1. Gather a representative (for that group) set of ham and spam. 2. Train on the above and insinuate spambayes into your qmail front-end. 3. Somewhere between qmail and your users (maybe just another qmail filter downstream from the spambayes filter), extract messages marked as spam (and possibly unsure), dropping them into one or two mailboxes so you can scan the spam (and push false positives along to their rightful recipients) and unsures (use them to refine the training, again pushing any ham along). 4. Ask your users to forward to you (with full headers) any spam they receive which leaks through. Use that for further training. I do something similar on a smaller scale on my mail server. My wife's online interests are essentially a proper subset of mine, so I use the same training set for both of us. I have her procmail setup direct marked-as-spam messages to me. She gets everything else. I've heard no complaints from her so far. In fact, she doesn't even know I have things set up this way. ;-) She just gets a lot less spam. Skip From popiel at wolfskeep.com Thu Feb 13 14:06:41 2003 From: popiel at wolfskeep.com (T. Alexander Popiel) Date: Thu Feb 13 17:06:46 2003 Subject: [Spambayes] pop3proxy: what are (none) messages? In-Reply-To: Message from Richie Hindle References: of "Thu, 13 Feb 2003 21:15:37 GMT." <20030213213122.B74DE2DED0@cashew.wolfskeep.com> Message-ID: <20030213220641.F0FC92DED0@cashew.wolfskeep.com> In message: Richie Hindle writes: > >[Alex] >> One thing that I noticed recently going through mboxutils.py is that >> if a message fails to parse for some reason, spambayes then strips off >> _ALL_ headers and tries again. This could obviously be quite damaging... > >That's very bad if it can ever happen to 'live' email. The POP3 proxy will >never do this - when modifying messages it uses a deliberately naive >'parser' (which is "so simple that it obviously has no flaws") that will >either work perfectly or fail cleanly (notwithstanding protocol errors, >with apologies to François 8-). This business of failing to parse headers >only affects the way messages are displayed on the Review page. Well, hammie managed to munch one of my mails by nuking all preexisting headers, so I think that this (or something like it) is in the live stream for hammie (or was at the time when it happened - it may have been fixed). I'll poke in a little more detail in a few minutes. - Alex From tim at fourstonesExpressions.com Thu Feb 13 16:07:33 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Thu Feb 13 17:07:44 2003 Subject: [Spambayes] ini file fumbling broke In-Reply-To: Message-ID: <84HEA94ZBLI52TPA8ZD9LGOKPJXRJH.3e4c1725@myst> 2/13/2003 3:35:06 PM, Richie Hindle wrote: > >[Richie] >> It sounds like "os.pathsep" is what you're looking for. > >[Tim] >> Nope. Pathsep is forwardslash or backslash. We can't delimit multiple >> filenames on that... ;) - TimS > >Er: > > >>> sys.platform > 'win32' > >>> os.pathsep > ';' Er: never mind. I've been a stoopidhead lately... > >and from the documentation: > > pathsep > The character conventionally used by the operating system to separate > search patch components (as in PATH), such as ":" for POSIX or ";" for > Windows. > >-- >Richie Hindle >richie@entrian.com > > >_______________________________________________ >Spambayes mailing list >Spambayes@python.org >http://mail.python.org/mailman/listinfo/spambayes > > c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org From francois.granger at free.fr Thu Feb 13 23:09:09 2003 From: francois.granger at free.fr (Francois Granger) Date: Thu Feb 13 17:09:16 2003 Subject: [Spambayes] pop3proxy: what are (none) messages? In-Reply-To: References: of "Thu, 13 Feb 2003 21:15:37 GMT." <20030213213122.B74DE2DED0@cashew.wolfskeep.com> Message-ID: At 21:39 +0000 13/02/2003, in message Re: [Spambayes] pop3proxy: what are (none) messages?, Richie Hindle wrote: >(notwithstanding protocol errors, >with apologies to Fran?ois 8-) My apologies to find issues ;-) But that have been a long time..... My current verssion at home is from Jan 31 (my birthday). And it has been working like a charme. Iget this message in the terminal, but it still works: Starting Spambayes shell script with 2.3 Password: Loading database... Done. Listener on port 127.0.0.1:110 is proxying pop.nerim.net:110 Listener on port 127.0.0.2:110 is proxying pop.free.fr:110 Listener on port 127.0.0.3:110 is proxying altern.org:110 Listener on port 127.0.0.4:110 is proxying pop.laposte.net:110 User interface url is http://localhost:8880/ error: uncaptured python exception, closing channel (socket.error:(32, 'Broken pipe') [/Library/Frameworks/Python.framework/Versions/2.3/lib/python2.3/asynchat.py|initiate_send|219] [/Library/Frameworks/Python.framework/Versions/2.3/lib/python2.3/asyncore.py|send|334]) error: uncaptured python exception, closing channel (socket.error:(32, 'Broken pipe') [/Library/Frameworks/Python.framework/Versions/2.3/lib/python2.3/asynchat.py|initiate_send|219] [/Library/Frameworks/Python.framework/Versions/2.3/lib/python2.3/asyncore.py|send|334]) error: uncaptured python exception, closing channel (socket.error:(32, 'Broken pipe') [/Library/Frameworks/Python.framework/Versions/2.3/lib/python2.3/asynchat.py|initiate_send|219] [/Library/Frameworks/Python.framework/Versions/2.3/lib/python2.3/asyncore.py|send|334]) error: uncaptured python exception, closing channel (socket.error:(32, 'Broken pipe') [/Library/Frameworks/Python.framework/Versions/2.3/lib/python2.3/asynchat.py|initiate_send|219] [/Library/Frameworks/Python.framework/Versions/2.3/lib/python2.3/asyncore.py|send|334]) error: uncaptured python exception, closing channel (socket.error:(32, 'Broken pipe') [/Library/Frameworks/Python.framework/Versions/2.3/lib/python2.3/asynchat.py|initiate_send|219] [/Library/Frameworks/Python.framework/Versions/2.3/lib/python2.3/asyncore.py|send|334]) -- Hofstadter's Law : It always takes longer than you expect, even when you take into account Hofstadter's Law. From popiel at wolfskeep.com Thu Feb 13 14:18:01 2003 From: popiel at wolfskeep.com (T. Alexander Popiel) Date: Thu Feb 13 17:18:05 2003 Subject: [Spambayes] Hammie can munch headers... In-Reply-To: Message from "T. Alexander Popiel" <20030213220641.F0FC92DED0@cashew.wolfskeep.com> References: "Thu, 13 Feb 2003 21:15:37 GMT." <20030213213122.B74DE2DED0@cashew.wolfskeep.com> <20030213220641.F0FC92DED0@cashew.wolfskeep.com> Message-ID: <20030213221801.82ADE2DED0@cashew.wolfskeep.com> In message: <20030213220641.F0FC92DED0@cashew.wolfskeep.com> "T. Alexander Popiel" writes: >In message: > Richie Hindle writes: >> >>[Alex] >>> One thing that I noticed recently going through mboxutils.py is that >>> if a message fails to parse for some reason, spambayes then strips off >>> _ALL_ headers and tries again. This could obviously be quite damaging... >> >>That's very bad if it can ever happen to 'live' email. > >Well, hammie managed to munch one of my mails Confirmed: hammiefilter is still using mboxutils.get_message to read the message, and then spits out a reconstituted mail based on that. This includes the code to trash all headers if parsing goes belly-up. This is not directly relevant to pop3proxy, but is something to be fixed... - Alex From tim at fourstonesExpressions.com Thu Feb 13 16:18:23 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Thu Feb 13 17:18:34 2003 Subject: [Spambayes] ini file fumbling broke In-Reply-To: <15948.3873.206619.827333@montanaro.dyndns.org> Message-ID: 2/13/2003 3:33:21 PM, Skip Montanaro wrote: > > >> This would include a filename-seperator string, presumably defaulting > >> to ":" (covering *nix and OSX), with ";" for Win* > > Richie> It sounds like "os.pathsep" is what you're looking for. > >Yes it does. os.sep and os.pathsep rarely occur to me because neither is >actually in os.path. I'll check in a fix so everyone can scream at >me... ;-) You certainly have my permission to unfix my fix... > >Skip > >_______________________________________________ >Spambayes mailing list >Spambayes@python.org >http://mail.python.org/mailman/listinfo/spambayes > > c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org From whisper at oz.net Thu Feb 13 14:33:03 2003 From: whisper at oz.net (David LeBlanc) Date: Thu Feb 13 17:33:48 2003 Subject: [Spambayes] Update on Spammie on Outlook. Message-ID: FWIW, I'm running OL 2000 build 9.0.0.2711 - internet mail only. After culling the "good but deleted" messages out of my spams, retraining from scratch and _rebooting_ (even on Win 2000 pro!), OL is acting much more docile. It (OL) is filtering and "foldering" as usual (poorly, but better than by hand) and spammie itself is doing very little false spamming and only a little false "spam maybe" (where false here is defined as good messages in the spam maybe folder). OL's rule error dialogs have stopped. I suspect that Mark was right about it being something to do with some sort of backend spooling or something. I noticed that in spite of having exited OL-the-GUI, OL.exe itself was still running in the task manager and consuming 26mb of space. I think some of my problems arose last evening from there somehow being two of these "backend" OL.exes running (both at about 26mb). I think part of the reason I had to reboot was that killing the OL.exe process in the task manager leaves some other things hanging. Curiously and coincidentally with the installation of spammie, the constant spate of mortage, viargra, genital enlargement, weight loss and unbeatable deals from Nigeria have abruptly fallen to a trickle. They're not showing up in the spam folder either. David LeBlanc Seattle, WA USA From whisper at oz.net Thu Feb 13 14:50:05 2003 From: whisper at oz.net (David LeBlanc) Date: Thu Feb 13 17:50:21 2003 Subject: [Spambayes] Dumping the db Message-ID: Is there a convenient way to dump the db used by spammie? I'm also curious as to the mechanics of "recover from spam" - how effective it it at unlearning? Does the unlearning make a difference based on whether it's in "spam" or "spam maybe"? David LeBlanc Seattle, WA USA From tim at fourstonesExpressions.com Thu Feb 13 16:52:16 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Thu Feb 13 17:52:29 2003 Subject: [Spambayes] Dumping the db In-Reply-To: Message-ID: 2/13/2003 4:50:05 PM, "David LeBlanc" wrote: >Is there a convenient way to dump the db used by spammie? dbExpImp.py >David LeBlanc >Seattle, WA USA > > >_______________________________________________ >Spambayes mailing list >Spambayes@python.org >http://mail.python.org/mailman/listinfo/spambayes > > c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org From neale at woozle.org Thu Feb 13 14:58:00 2003 From: neale at woozle.org (Neale Pickett) Date: Thu Feb 13 17:58:32 2003 Subject: [Spambayes] Any prospect of spambayes working with qmail? In-Reply-To: <15948.5547.664621.961364@montanaro.dyndns.org> (Skip Montanaro's message of "Thu, 13 Feb 2003 16:01:15 -0600") References: <0CAXUON08BA52TPJE1T98TN07ZW6ZUR.3e4c0ab4@myst> <1045171718.5666.784.camel@adamselene.home> <15948.5547.664621.961364@montanaro.dyndns.org> Message-ID: Skip Montanaro writes: > I do something similar on a smaller scale on my mail server. My > wife's online interests are essentially a proper subset of mine, so I > use the same training set for both of us. I have her procmail setup > direct marked-as-spam messages to me. She gets everything else. I've > heard no complaints from her so far. In fact, she doesn't even know I > have things set up this way. ;-) She just gets a lot less spam. What Skip is describing here is essentially what we're planning to implement at $FIRM. You buy $FIRM's firewall appliance, put it on your network, and give it an address to send suspected spam. That person (the spammaster) goes through and weeds out any false positives, re-sending them to a special address which then retrains on its mistake and mails the original message to the original recipient. The problem with this setup is that if it gets a false-negative, the end-user must forward that message back to the classifier to be traned as spam. This is a really big problem, since by the time Outlook gets its grubby hands on it, the original message is irreparably damaged. You can still get a lot of useful information out of it in this mangled state, but whether or not it's enough information remains to be seen. You can probably set up a pretty good wordlist by training on a week's worth of collected ham and spam--less if you're a bigger site. But unless you constantly retrain it, your accuracy will gradually degrade. You have to keep retraining the classifier as your spam and ham change in nature. It's hard to make a learning classifier work for a big site, since by its very nature it must get feedback on how it's doing, and most people don't have the patience to train a mail filter--they just want to read their email and get on with their lives. So far, this ease-of-use question has been answered by trying to integrate the filter into the client. A user probably won't mind (in fact, most of them probably relish) hitting a scowling yellow face for "delete as spam". A user will probably be more reluctant to take the time to forward all their spam to a special address. This is why our focus has been on the client and not the server, and this is why everyone keeps telling you to use SpamAssassin (which only requires feedback in the form of installing the latest version*). But feel free to point out why I'm wrong. I *want* to be wrong on this one :) Neale * Okay, so SA has learning aspects to it, but you don't *have* to use them to get good results. With SpamBayes, if you don't use the learning stuff, you get a useless filter. From skip at pobox.com Thu Feb 13 16:59:04 2003 From: skip at pobox.com (Skip Montanaro) Date: Thu Feb 13 17:59:13 2003 Subject: [Spambayes] Dumping the db In-Reply-To: References: Message-ID: <15948.9016.996035.878141@montanaro.dyndns.org> David> Is there a convenient way to dump the db used by spammie? Yes, I recently checked in a script to the Tools/scripts directory of the main Python distribution. Browse there for db2pickle.py and pickle2db.py. You could also write db2csv.py and csv2db.py once I get the csv module accepted and checked in. ;-) David> I'm also curious as to the mechanics of "recover from spam" - how David> effective it it at unlearning? If I understand correctly, all it does is reduce the relevant counters in the database. David> Does the unlearning make a difference based on whether it's in David> "spam" or "spam maybe"? Not that I'm aware of. Skip From mhammond at skippinet.com.au Fri Feb 14 10:06:48 2003 From: mhammond at skippinet.com.au (Mark Hammond) Date: Thu Feb 13 18:07:48 2003 Subject: [Spambayes] Dumping the db In-Reply-To: Message-ID: <006201c2d3b4$968be600$530f8490@eden> > Is there a convenient way to dump the db used by spammie? Nope, but Outlook2000/tester.py effectively does this so it can pull out the most and least spammy words, and create a mail with nothing but them (couldn't think of a better way to auto-generate a certain spam guaranteed to be spam regardless of your training data :) It is often more interesting to show the "spam clues" - this is a dump of the database, but only of the words that contributed to the score for the particular mail. You may be dissapointed at what a database dump looks like - a list of words with 2 counts. > I'm also curious as to the mechanics of "recover from spam" - > how effective > it it at unlearning? It should be "perfect" - however, I expect "perfect" to the engine is different to what it means to you. After doing a "recover from spam", the database should be left in the same state as if that particular message had originally been trained as "ham". Specifically, all that code does is to train on that single message as ham, then move the message back to where we first got it from. (There is some extra logic in place that ensures that if the message had previously been trained as spam, that will be undone - but this is a rare case. In most cases, a message has just been *filtered* when you recover it, not actually *trained*.) Note that all this does is update the training data. You may have noticed that some messages in your "Inbox" still have a high(ish) spam rating, even though they have been trained as good. It can be disconcerting to train the system that a mail is good, just to see that the system still treats the message as suspect. As you get more similar messages and continue to train (by recovering), it gets better. I have found it useful to create a folder of "Good, spam-looking mail". This is mail that I have seen spambayes occasionally mis-classify, and that I would normally have just deleted. I keep them around so that when I need to start from scratch, I have some examples of good stuff to help it out, and avoid that initial stage. I've actually been thinking that it may be useful for spambayes to automatically keep copies of such messages, should a full retrain ever be necessary "in the wild". > Does the unlearning make a difference based on whether it's > in "spam" or > "spam maybe"? Nope. As I mentioned above, in the vast majority of cases, no "unlearning" is done - only learning that it is good (or bad, depending on the case). Unlearning is only neccessary if it has previously been trained incorrectly, and this is rare. Mark. From skip at pobox.com Thu Feb 13 17:16:40 2003 From: skip at pobox.com (Skip Montanaro) Date: Thu Feb 13 18:16:51 2003 Subject: [Spambayes] Any prospect of spambayes working with qmail? In-Reply-To: References: <0CAXUON08BA52TPJE1T98TN07ZW6ZUR.3e4c0ab4@myst> <1045171718.5666.784.camel@adamselene.home> <15948.5547.664621.961364@montanaro.dyndns.org> Message-ID: <15948.10072.947508.669949@montanaro.dyndns.org> Neale> The problem with this setup is that if it gets a false-negative, Neale> the end-user must forward that message back to the classifier to Neale> be traned as spam. This is a really big problem, since by the Neale> time Outlook gets its grubby hands on it, the original message is Neale> irreparably damaged. You can still get a lot of useful Neale> information out of it in this mangled state, but whether or not Neale> it's enough information remains to be seen. Some people would say, "Then don't use Outlook." ;-) How about you cache all messages on the server for a short period of time (a week or so), indexed by message-id? If a user bounces back a false-negative, use the message-id to retrieve the real message and only use the Outlook-mangled message if that item has already expired from the cache. Skip From mhammond at skippinet.com.au Fri Feb 14 10:20:10 2003 From: mhammond at skippinet.com.au (Mark Hammond) Date: Thu Feb 13 18:21:10 2003 Subject: [Spambayes] Dumping the db In-Reply-To: <006201c2d3b4$968be600$530f8490@eden> Message-ID: <006501c2d3b6$74d3c760$530f8490@eden> > > Is there a convenient way to dump the db used by spammie? > > Nope, but ... OK - make that "Yep, and ... Mark. From noreply at sourceforge.net Thu Feb 13 15:45:29 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Thu Feb 13 20:24:29 2003 Subject: [Spambayes] [ spambayes-Feature Requests-685746 ] Outlook plugin folder list sorted alphabetically Message-ID: Feature Requests item #685746, was opened at 2003-02-13 16:40 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498106&aid=685746&group_id=61702 Category: None Group: None >Status: Closed Priority: 5 Submitted By: Tony Meyer (anadelonbrin) Assigned to: Mark Hammond (mhammond) Summary: Outlook plugin folder list sorted alphabetically Initial Comment: It would be nice (but purely cosmetic) if the folder list that is generated by _BuildFoldersMAPI (in FolderSelector.py) was sorted alphabetically (within each folder). This is the view that Outlook provides in it's folder list, and it's a little confusing finding folders in a different order. If this is particularly non-trivial, then it might not be worth doing. ---------------------------------------------------------------------- >Comment By: Mark Hammond (mhammond) Date: 2003-02-14 10:45 Message: Logged In: YES user_id=14198 Checking in FolderSelector.py; /cvsroot/spambayes/spambayes/Outlook2000/dialogs/FolderSelector.py,v <-- FolderSelector.py new revision: 1.13; previous revision: 1.12 done ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498106&aid=685746&group_id=61702 From jwilliam at xmission.com Thu Feb 13 18:54:56 2003 From: jwilliam at xmission.com (Jerry Williams) Date: Thu Feb 13 20:55:00 2003 Subject: [Spambayes] Another bug? Outlook 2000 win2k Message-ID: I think that I noticed another possible bug today. On my windows 2k laptop using an exchange server. I have read messages from my inbox and then come back and found them marked as unread. It has happened a couple of times. I am trying to pay more attention to the problem to see if I can figure out more details. Has this been seen before? Thanks! From T.A.Meyer at massey.ac.nz Fri Feb 14 15:03:52 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Thu Feb 13 21:04:29 2003 Subject: [Spambayes] Another bug? Outlook 2000 win2k Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1318D4CD@its-xchg4.massey.ac.nz> > I think that I noticed another possible bug today. On my > windows 2k laptop using an exchange server. I have read > messages from my inbox and then come back and found them > marked as unread. It has happened a couple of times. I > am trying to pay more attention to the problem to see if I > can figure out more details. > Has this been seen before? I think I mentioned something about this on the list a while back. If it's the same behaviour as I was seeing, then it's caused by messages not being scored/filtered immediately (presumably the hook doesn't get called). Meanwhile the message is marked as read. Then the hook finally gets called, the message is scored, and left as unread. Does this match what you are seeing? I presume that a fix for this would be to record the unread/read status of the message before scoring, and set it to the same after scoring is complete. Mark might do this at some point, although he seems busy with a lot of other fixes :) at the moment. =Tony Meyer From skip at pobox.com Thu Feb 13 20:23:14 2003 From: skip at pobox.com (Skip Montanaro) Date: Thu Feb 13 21:23:17 2003 Subject: [Spambayes] working around missing Content-Transfer-Encoding wrt base64 Message-ID: <15948.21266.588532.944417@montanaro.dyndns.org> I'm going out of town (completely away from computers) for a week early Saturday. Would somebody else please test the attached patch against their email collection and see if it makes sense to check in? Summary of my tests are at http://mail.python.org/pipermail/spambayes/2003-February/003312.html Thanks, Skip -------------- next part -------------- A non-text attachment was scrubbed... Name: sb.diff Type: application/octet-stream Size: 2078 bytes Desc: not available Url : http://mail.python.org/pipermail/spambayes/attachments/20030213/fe5a1f11/sb.obj From skip at pobox.com Thu Feb 13 20:42:01 2003 From: skip at pobox.com (Skip Montanaro) Date: Thu Feb 13 21:42:06 2003 Subject: [Spambayes] Danger, Will Robinson - BAYESCUSTOMIZE change! Message-ID: <15948.22393.399601.67469@montanaro.dyndns.org> I just checked in the os.pathsep fix to Options.py. Note that this will change all uses of that environment variable. No longer will you use a space to separate multiple filenames. Unix-y systems will use ":". Windows will use ";", Mac pre-OSX will use "\n", and RiscOS will use ",". I also added a note to README.txt. I didn't see any other necessary changes in the distribution code. Skip From whisper at oz.net Fri Feb 14 00:04:40 2003 From: whisper at oz.net (David LeBlanc) Date: Fri Feb 14 03:04:43 2003 Subject: [Spambayes] Another bug? Outlook 2000 win2k In-Reply-To: Message-ID: > -----Original Message----- > From: spambayes-bounces@python.org > [mailto:spambayes-bounces@python.org]On Behalf Of Jerry Williams > Sent: Thursday, February 13, 2003 17:55 > To: SpamBayes > Subject: [Spambayes] Another bug? Outlook 2000 win2k > > > I think that I noticed another possible bug today. On my windows > 2k laptop > using an exchange server. I have read messages from my inbox and > then come > back and found them marked as unread. It has happened a couple of > times. I > am trying to pay more attention to the problem to see if I can figure out > more details. > Has this been seen before? > Thanks! > When messages are read in my "spam maybe" folder and then deleted as spam, they go to unread status in the spam folder. Means I have to manually mark them as read. David LeBlanc Seattle, WA USA From Paul.Moore at atosorigin.com Fri Feb 14 10:16:03 2003 From: Paul.Moore at atosorigin.com (Moore, Paul) Date: Fri Feb 14 05:16:37 2003 Subject: [Spambayes] Any prospect of spambayes working with qmail? Message-ID: <16E1010E4581B049ABC51D4975CEDB886199E5@UKDCX001.uk.int.atosorigin.com> From: Neale Pickett [mailto:neale@woozle.org] > You can probably set up a pretty good wordlist by training on a week's > worth of collected ham and spam--less if you're a bigger site. But unless > you constantly retrain it, your accuracy will gradually degrade. You have > to keep retraining the classifier as your spam and ham change in nature. Rambling philosophical post ahead This is still my biggest worry with ongoing use of Spambayes, from an end user point of view. I have 2 setups, one at work using Outlook with the plugin, and one at home using pop3proxy. Both work fine at the moment, but I'm starting to suspect there's an increase in the number of unsures I'm getting (still no significant FPs or FNs, so to some extent this is all in the noise...) [Feature request: Would it be possible or useful for pop3proxy to maintain stats on numbers of unsures/ham/spam per day or whatever? Could be useful for ongoing review...] My ham:spam ratio at work is probably around 1:1, so that's not too bad. At home, though, ham:spam is something ridiculous like 1:25 (I filter out mailing list traffic into a local newsserver before spambayes gets a look in). At work, I train using the natural Outlook plugin approach, which is basically training on unsures only. My DB at work has about 8000 ham and 6000 spam. At home, I train on everything (basically, I regularly go through the pop3proxy web interface and train on all the outstanding messages (I never mark anything as "discard"). I don't have the DB size figures from home with me, but I think the training DB is very spam-heavy. Both database have Tim's "experimental spam/ham imbalance flag" set to the default (true, I believe). I don't know whether that's going to matter, but I worry it might start devaluing spam clues at home, where I have so few ham to compare with. I dunno, this really isn't much more than rambling. I have no stats to prove anything, and no real complaints. I'm just spoiled - things are so much better nowadays that reviewing 3 or 4 unsures is a great chore... I know there have been some experiments in the past done on training methods, and they were basically inconclusive (IIRC). I guess what I'm wondering is whether there's anything new to say on the matter now that people have been running spambayes "for real" for a decent time. One possibility I'd thought of is to do intermittent training - start with an empty database (or maybe one preseeded with a small representative message base), then train for a week or two (which will tune the DB a bit. Then stop training for a while (a couple of months) and then train on everything for a week. Repeat the stop/train cycle. The idea being that this would catch new spam techniques, without needing too much ongoing training. The downside is that I can see no way of testing this approach. Hey - if I wrote up a small document on the various possible training methods (there aren't that many that I can think of) would that be of any use for the documentation? Any thoughts? Paul. From mhammond at skippinet.com.au Fri Feb 14 23:41:57 2003 From: mhammond at skippinet.com.au (Mark Hammond) Date: Fri Feb 14 07:42:58 2003 Subject: [Spambayes] Another bug? Outlook 2000 win2k In-Reply-To: Message-ID: <005c01c2d426$76efada0$530f8490@eden> > When messages are read in my "spam maybe" folder and then > deleted as spam, > they go to unread status in the spam folder. Means I have to > manually mark them as read. Note that spambayes does not (directly) touch the "unread" status of a message. I say "directly", as there may be some side-effect that we don't know about, but as far as I know, the read/unread status of the message is purely up to outlook. There is a bug/feature request at source-forge on this - clarification of exactly what we want would be good :) Mark. From whisper at oz.net Fri Feb 14 05:24:19 2003 From: whisper at oz.net (David LeBlanc) Date: Fri Feb 14 08:24:23 2003 Subject: [Spambayes] Another bug? Outlook 2000 win2k In-Reply-To: <005c01c2d426$76efada0$530f8490@eden> Message-ID: > -----Original Message----- > From: Mark Hammond [mailto:mhammond@skippinet.com.au] > Sent: Friday, February 14, 2003 4:42 > To: 'David LeBlanc'; 'SpamBayes' > Subject: RE: [Spambayes] Another bug? Outlook 2000 win2k > > > > When messages are read in my "spam maybe" folder and then > > deleted as spam, > > they go to unread status in the spam folder. Means I have to > > manually mark them as read. > > Note that spambayes does not (directly) touch the "unread" status of a > message. I say "directly", as there may be some side-effect that we don't > know about, but as far as I know, the read/unread status of the message is > purely up to outlook. > > There is a bug/feature request at source-forge on this - clarification of > exactly what we want would be good :) > > Mark. > "Read once - preserve status" would be good! If I've read it once, it was either "spam maybe" or got past spammie. Once (re)classified as spam, it's a PITA to have to go and mark it as read again. OTOH, if spammie puts it directly into the spam folder, then I might want to read it to verify that it really is spam. It's also true that if you "unread" a msg when you put it into spam, it's hard to tell those from the "never reads" that spammie put there itself. Dave LeBlanc Seattle, WA USA From mhammond at skippinet.com.au Sat Feb 15 00:50:23 2003 From: mhammond at skippinet.com.au (Mark Hammond) Date: Fri Feb 14 08:51:15 2003 Subject: [Spambayes] Another bug? Outlook 2000 win2k In-Reply-To: Message-ID: <006901c2d430$071105b0$530f8490@eden> > "Read once - preserve status" would be good! If I've read it > once, it was > either "spam maybe" or got past spammie. Once (re)classified > as spam, it's a > PITA to have to go and mark it as read again. OTOH, if spammie puts it > directly into the spam folder, then I might want to read it > to verify that > it really is spam. It's also true that if you "unread" a msg > when you put it > into spam, it's hard to tell those from the "never reads" > that spammie put > there itself. I'm afraid I am still not sure specifically what you are suggesting we should change. MArk. From popiel at wolfskeep.com Fri Feb 14 06:21:46 2003 From: popiel at wolfskeep.com (T. Alexander Popiel) Date: Fri Feb 14 09:21:52 2003 Subject: [Spambayes] Any prospect of spambayes working with qmail? In-Reply-To: Message from "Moore, Paul" <16E1010E4581B049ABC51D4975CEDB886199E5@UKDCX001.uk.int.atosorigin.com> References: <16E1010E4581B049ABC51D4975CEDB886199E5@UKDCX001.uk.int.atosorigin.com> Message-ID: <20030214142147.07CF42DE8C@cashew.wolfskeep.com> In message: <16E1010E4581B049ABC51D4975CEDB886199E5@UKDCX001.uk.int.atosorigin.com> "Moore, Paul" writes: > >I know there have been some experiments in the past done on training >methods, and they were basically inconclusive (IIRC). I guess what I'm >wondering is whether there's anything new to say on the matter now that >people have been running spambayes "for real" for a decent time. I'm working on a bunch of graphs of accuracy over different types of training. It's just taking me a bit longer than expected (I'm having to deal with cracked machines at the moment). >One possibility I'd thought of is to do intermittent training - start >with an empty database (or maybe one preseeded with a small representative >message base), then train for a week or two (which will tune the DB a >bit. Then stop training for a while (a couple of months) and then train >on everything for a week. Repeat the stop/train cycle. The idea being >that this would catch new spam techniques, without needing too much >ongoing training. The downside is that I can see no way of testing this >approach. For building the graphs, I already have a testing harness which sorts and processes the messages in a linear fashion, with various rules for whether to train on a message or not. This sounds like just another rule for it. >Hey - if I wrote up a small document on the various possible training >methods (there aren't that many that I can think of) would that be of >any use for the documentation? Yes. It'd also be a great source for rules for my testing harness. If you make the doc, I may be able to provide graphs of accuracy to go with it... - Alex From root at shopip.com Thu Feb 13 21:44:58 2003 From: root at shopip.com (Charlie ROOT) Date: Fri Feb 14 09:39:25 2003 Subject: [Spambayes] BUG: Minor, but obvious, typo Message-ID: spambayes-1.0a2/spambayes/resources/ui.html line 245: "incuding" -> "including" -Charlie *God has left the building* From tim at fourstonesExpressions.com Fri Feb 14 08:41:04 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Fri Feb 14 09:41:17 2003 Subject: [Spambayes] BUG: Minor, but obvious, typo In-Reply-To: Message-ID: <21FBSPPLVQVUFEOKUPPLHCVR95F4HB.3e4d0000@myst> Wish they were all that easy to fix . Done. -TimS 2/13/2003 11:44:58 PM, Charlie ROOT wrote: >spambayes-1.0a2/spambayes/resources/ui.html line 245: > "incuding" -> "including" > >-Charlie >*God has left the building* > > >_______________________________________________ >Spambayes mailing list >Spambayes@python.org >http://mail.python.org/mailman/listinfo/spambayes > > c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org From jwilliam at xmission.com Fri Feb 14 10:56:22 2003 From: jwilliam at xmission.com (Jerry Williams) Date: Fri Feb 14 12:56:28 2003 Subject: [Spambayes] Another bug? Outlook 2000 win2k In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F1318D4CD@its-xchg4.massey.ac.nz> Message-ID: Well it isn't quite like Tony said. I have the SPAM column showing and it did say 0% on the message before I opened it. But when I look at the trace log it says: Processing 41 missed spam in folder 'Inbox' took 116781ms Message 'subject line' had a Spam classification of 'No' So it is like it is processed or at the very end of processing and then I read the message and then it changes it to unread. I had also replied to the message and that status was lost also. I think that part of the problem may be the fact that reading mail from the exchange server is a lot slower than if you use pop. -----Original Message----- From: spambayes-bounces@python.org [mailto:spambayes-bounces@python.org]On Behalf Of Meyer, Tony Sent: Thursday, February 13, 2003 7:04 PM To: SpamBayes Subject: RE: [Spambayes] Another bug? Outlook 2000 win2k > I think that I noticed another possible bug today. On my > windows 2k laptop using an exchange server. I have read > messages from my inbox and then come back and found them > marked as unread. It has happened a couple of times. I > am trying to pay more attention to the problem to see if I > can figure out more details. > Has this been seen before? I think I mentioned something about this on the list a while back. If it's the same behaviour as I was seeing, then it's caused by messages not being scored/filtered immediately (presumably the hook doesn't get called). Meanwhile the message is marked as read. Then the hook finally gets called, the message is scored, and left as unread. Does this match what you are seeing? I presume that a fix for this would be to record the unread/read status of the message before scoring, and set it to the same after scoring is complete. Mark might do this at some point, although he seems busy with a lot of other fixes :) at the moment. =Tony Meyer _______________________________________________ Spambayes mailing list Spambayes@python.org http://mail.python.org/mailman/listinfo/spambayes From whisper at oz.net Fri Feb 14 13:10:51 2003 From: whisper at oz.net (David LeBlanc) Date: Fri Feb 14 16:10:53 2003 Subject: [Spambayes] Another bug? Outlook 2000 win2k In-Reply-To: <006901c2d430$071105b0$530f8490@eden> Message-ID: > > "Read once - preserve status" would be good! If I've read it > > once, it was > > either "spam maybe" or got past spammie. Once (re)classified > > as spam, it's a > > PITA to have to go and mark it as read again. OTOH, if spammie puts it > > directly into the spam folder, then I might want to read it > > to verify that > > it really is spam. It's also true that if you "unread" a msg > > when you put it > > into spam, it's hard to tell those from the "never reads" > > that spammie put > > there itself. > > I'm afraid I am still not sure specifically what you are suggesting we > should change. > > MArk. > There are two situations: spam and possible spam. When spammie identifies spam and puts it into the spam folder, it's status is unread - this is good. It marks spam that one might want to review. In the case of possible spam (my spam maybe folder), two courses of action are possible: delete as spam or recover as spam. I look at the possible spam to choose which button to hit and the msg's status will be read because I looked at it. However, when spammie either recovers or deletes the message I've already read, it moves the msg's status back to unread when it moves it into the appropriate destination folder! This means that I'm either going to have to reread it in the spam folder or at the very least manually mark it as read. As I'm writing this, it strikes me that this is Outlook doing this. In the case of recovered spam, "Recover As Spam" causes the recovered msgs to go to the folder they would have gone to if OL had sorted them, so maybe it's OL that's setting the read status. Dave LeBlanc Seattle, WA USA From rob at hooft.net Sun Feb 16 11:20:51 2003 From: rob at hooft.net (Rob Hooft) Date: Sun Feb 16 05:27:30 2003 Subject: [Spambayes] Are they learning? Message-ID: <3E4F6603.3090901@hooft.net> Just received this via a python.org mailinglist; spam is evolving strongly to avoid automatic detection by bayesian techniques. ================================================================== Here is your chance to skip the embarrassing, time-consuming task of going to d.octor's appointments to try and get the p.rescription you want. We're ready to Fedex your online v_i a g r_a p.rescription orders right now!!! No p.rescription necessary!!! Just visit the link below and have your v_i a g r_a on its way to you today!!! http://www.TropicalPills.com/main2.php?rx=17283 ================================================================== -- Rob W.W. Hooft || rob@hooft.net || http://www.hooft.net/people/rob/ From tim at fourstonesExpressions.com Sun Feb 16 08:08:06 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Sun Feb 16 09:08:48 2003 Subject: [Spambayes] Are they learning? In-Reply-To: <3E4F6603.3090901@hooft.net> Message-ID: <4YKGSMYEDJDJFV3WVIETHG2WEA06.3e4f9b46@myst> Did sb classify this one correctly? - TimS 2/16/2003 4:20:51 AM, Rob Hooft wrote: >Just received this via a python.org mailinglist; spam is evolving >strongly to avoid automatic detection by bayesian techniques. > >================================================================== >Here is your chance to skip the embarrassing, time-consuming task of >going to d.octor's appointments to try and get the p.rescription you want. >We're ready to Fedex your online v_i a g r_a p.rescription orders >right now!!! > >No p.rescription necessary!!! Just visit the link below and have your >v_i a g r_a on its way to you today!!! >http://www.TropicalPills.com/main2.php?rx=17283 >================================================================== > > >-- >Rob W.W. Hooft || rob@hooft.net || http://www.hooft.net/people/rob/ > > >_______________________________________________ >Spambayes mailing list >Spambayes@python.org >http://mail.python.org/mailman/listinfo/spambayes > > c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org From wsy at merl.com Sun Feb 16 09:50:24 2003 From: wsy at merl.com (Bill Yerazunis) Date: Sun Feb 16 09:51:04 2003 Subject: [Spambayes] Are they learning? In-Reply-To: <3E4F6603.3090901@hooft.net> (message from Rob Hooft on Sun, 16 Feb 2003 11:20:51 +0100) References: <3E4F6603.3090901@hooft.net> Message-ID: <200302161450.h1GEoOd19757@localhost.localdomain> Rob: Actually, I wouldn't worry about it. That format still has 98% or so of the tokens intact, and the phrasing is still heavily spam. Only "doctor", "prescription" and "viagra" were actually cloaked at all, which implies to me it's only trying to dodge those few keywords. Can you forward the full spam, headers intact (to me, at least)? I want to see if the phrasing is enough for CRM114 to detect it successfully. -Bill Y. From nas at python.ca Sun Feb 16 07:12:51 2003 From: nas at python.ca (Neil Schemenauer) Date: Sun Feb 16 10:04:06 2003 Subject: [Spambayes] Are they learning? In-Reply-To: <200302161450.h1GEoOd19757@localhost.localdomain> References: <3E4F6603.3090901@hooft.net> <200302161450.h1GEoOd19757@localhost.localdomain> Message-ID: <20030216151251.GA16765@glacier.arctrix.com> Bill Yerazunis wrote: > Actually, I wouldn't worry about it. That format still has 98% or so > of the tokens intact, and the phrasing is still heavily spam. Spambayes with my DB still finds lots of clues: '*H*' 0.100221799828 '*S*' 0.954253123262 'skip' 0.0256108900427 'task' 0.0364772269246 'header:Message-id:1' 0.0602125241541 'content-type:text/plain' 0.254441488712 'chance' 0.277902899943 'want.' 0.309419424496 'going' 0.371615649759 'try' 0.384359911306 'skip:e 10' 0.63826549096 'link' 0.642099485564 'proto:http' 0.662211682949 'url:www' 0.68175741844 'online' 0.725029235286 'visit' 0.726863723599 'ready' 0.738706256842 'url:com' 0.752211285796 'here' 0.771157119995 'below' 0.819844718374 'now!!!' 0.844827586207 'url:tropicalpills' 0.844827586207 'appointments' 0.860483393027 'fedex' 0.895310019134 'orders' 0.936008158023 'url:main2' 0.949438202247 'url:rx' 0.949438202247 'url:php' 0.974186213206 The spammers are starting to squirm. :-) Neil From rob at hooft.net Sun Feb 16 16:21:24 2003 From: rob at hooft.net (Rob Hooft) Date: Sun Feb 16 10:28:07 2003 Subject: [Spambayes] Are they learning? In-Reply-To: <4YKGSMYEDJDJFV3WVIETHG2WEA06.3e4f9b46@myst> References: <4YKGSMYEDJDJFV3WVIETHG2WEA06.3e4f9b46@myst> Message-ID: <3E4FAC74.2040706@hooft.net> Tim Stone - Four Stones Expressions wrote: > Did sb classify this one correctly? - TimS Not incorrectly. With the python.org ham clues it was classified as Unsure, 0.46. Rob -- Rob W.W. Hooft || rob@hooft.net || http://www.hooft.net/people/rob/ From wsy at merl.com Sun Feb 16 11:11:47 2003 From: wsy at merl.com (Bill Yerazunis) Date: Sun Feb 16 11:13:06 2003 Subject: [Spambayes] Are they learning? In-Reply-To: <3E4FAC74.2040706@hooft.net> (message from Rob Hooft on Sun, 16 Feb 2003 16:21:24 +0100) References: <4YKGSMYEDJDJFV3WVIETHG2WEA06.3e4f9b46@myst> <3E4FAC74.2040706@hooft.net> Message-ID: <200302161611.h1GGBlf20007@localhost.localdomain> Re: the cloaked Viagra spam: I clipped off all of the headers with any reference to SpamBayes mailing list, and fed the remainder to CRM114. As suspected, the phrasing was a dead giveaway even if the keywords are cloaked. The actual result was: **CRM114 FAIL SBPH/BCR TEST** Probabalistic match quality: 1.000000 P(succ): 1.000000e-00, P(fail):2.070405e-11 S hits : 36578, F hits : 33524 ("pass" and "fail" here are inverted due to an unfortunate historical accident of programming - "pass" in the P-stats means "matches spam better", while "fail" in the capital letters at the top _also_ means "matches spam better". Don't let it worry you.) Note that even a simple event counter would have gotten this one right, as the number of corpus text hits was about 10% higher for the correct categorization than for the incorrect categorization. -Bill Y. From rob at hooft.net Sun Feb 16 17:27:52 2003 From: rob at hooft.net (Rob Hooft) Date: Sun Feb 16 11:34:39 2003 Subject: [Spambayes] Are they learning? In-Reply-To: <20030216151251.GA16765@glacier.arctrix.com> References: <3E4F6603.3090901@hooft.net> <200302161450.h1GEoOd19757@localhost.localdomain> <20030216151251.GA16765@glacier.arctrix.com> Message-ID: <3E4FBC08.6050602@hooft.net> Neil Schemenauer wrote: > Bill Yerazunis wrote: > >>Actually, I wouldn't worry about it. That format still has 98% or so >>of the tokens intact, and the phrasing is still heavily spam. > > > Spambayes with my DB still finds lots of clues: [...] > 'url:tropicalpills' 0.844827586207 [...] > The spammers are starting to squirm. :-) Lot less positive here. You're missing all the python.org mailinglist clues, because I only sent the message body! And apparently you have seen "tropicalpills" already before... Running it manually now, I get: X-Spambayes-Classification: unsure; 0.49 X-Hammie-Debug: '*H*': 0.90; '*S*': 0.88; 'subject:] ': 0.00; 'task': 0.01; 'header:Errors-To:1': 0.01; 'subject:[': 0.01; 'header:Received:7': 0.03; 'cc:2**1': 0.03; 'skip': 0.03; 'subject:SIG': 0.06; 'subject:Image': 0.06; 'try': 0.19; 'subject:-': 0.20; 'subject:from': 0.22; 'want.': 0.26; 'way': 0.28; 'going': 0.29; 'content-type:text/plain': 0.32; "we're": 0.33; 'its': 0.34; 'header:Message-ID:1': 0.35; 'skip:p 10': 0.39; 'right': 0.39; 'header:Importance:1': 0.61; 'url:com': 0.63; 'subject:, ': 0.66; 'your': 0.68; 'header:Reply-To:1': 0.68; 'link': 0.72; 'content-type:multipart/mixed': 0.73; 'ready': 0.73; 'skip:9 10': 0.73; 'url:www': 0.73; 'subject: ': 0.79; 'url:php': 0.82; 'below': 0.83; 'subject:home': 0.83; 'url:main2': 0.84; 'url:rx': 0.84; 'here': 0.86; 'online': 0.87; 'visit': 0.90; 'appointments': 0.91; 'subject:!\n\t': 0.91; 'subject:Order': 0.91; 'now!!!': 0.93; 'orders': 0.96; 'fedex': 0.97; 'header:MiME-Version:1': 1.00 -- Rob W.W. Hooft || rob@hooft.net || http://www.hooft.net/people/rob/ -------------- next part -------------- >From image-sig-admin@python.org Thu Feb 6 15:38:11 2003 Return-Path: Delivered-To: hooft@temoleh.chem.uu.nl Received: from mail.python.org (mail.python.org [12.155.117.29]) by temoleh.chem.uu.nl (Postfix) with ESMTP id 6D33276E70 for ; Thu, 6 Feb 2003 15:38:10 +0100 (CET) Received: from localhost.localdomain ([127.0.0.1] helo=mail.python.org) by mail.python.org with esmtp (Exim 4.05) id 18gnAE-0005DB-00; Thu, 06 Feb 2003 09:38:06 -0500 Received: from [218.76.246.37] (helo=nycmail.com) by mail.python.org with smtp (Exim 4.05) id 18gn9Y-0004dU-00; Thu, 06 Feb 2003 09:37:28 -0500 Received: from unknown (HELO q4.quickslow.com) (190.179.93.119) by smtp-server1.cflrr.com with NNFMP; 07 Feb 2003 09:36:49 -1000 Received: from unknown (HELO mta85.snfc21.pibi.net) (132.150.232.55) by web.mail.halfeye.com with smtp; 06 Feb 2003 23:29:53 -0400 Received: from unknown (HELO symail.kustanai.co.kr) (206.125.224.252) by smtp-server.tampabayr.com with QMQP; Thu, 06 Feb 2003 19:22:57 -0100 Received: from mailout2-eri1.midmouth.com ([188.67.25.193]) by smtp-server1.cflrr.com with QMQP; 06 Feb 2003 18:16:01 -0400 Reply-To: "Peter Lasader Phd6" <6PLasadercb416@nycmail.com> Message-ID: <005b44e33a2a$7533a1b2$3ab87cb4@rdtwyx> From: "Peter Lasader Phd6" <6PLasadercb416@nycmail.com> To: Cc: , MiME-Version: 1.0 Content-Type: multipart/mixed; boundary="----=_NextPart_000_00B3_03E22B0C.D3024A21" X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook IMO, Build 9.0.2416 (9.0.2910.0) Importance: Normal X-warning: 218.76.246.37 in blacklist at list.dsbl.org (http://dsbl.org/listing.php?218.76.246.37) Subject: [Image-SIG] Order v_i a g r_a from home, no doctors! 6470Mmzw5-869qv-14 Sender: image-sig-admin@python.org Errors-To: image-sig-admin@python.org X-BeenThere: image-sig@python.org X-Mailman-Version: 2.0.13 (101270) Precedence: bulk List-Help: List-Post: List-Subscribe: , List-Id: Image Processing with Python SIG List-Unsubscribe: , List-Archive: Date: Thu, 06 Feb 2003 19:30:05 -0500 X-Spambayes-Classification: unsure; 0.46 Status: RO ------=_NextPart_000_00B3_03E22B0C.D3024A21 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: base64 SGVyZSBpcyB5b3VyIGNoYW5jZSB0byBza2lwIHRoZSBlbWJhcnJhc3Npbmcs IHRpbWUtY29uc3VtaW5nIHRhc2sgb2YgDQpnb2luZyB0byBkLm9jdG9yJ3Mg YXBwb2ludG1lbnRzIHRvIHRyeSBhbmQgZ2V0IHRoZSBwLnJlc2NyaXB0aW9u IHlvdSB3YW50LiANCldlJ3JlIHJlYWR5IHRvIEZlZGV4IHlvdXIgb25saW5l ICAgdl9pIGEgZyByX2EgICBwLnJlc2NyaXB0aW9uIG9yZGVycyByaWdodCBu b3chISENCg0KTm8gcC5yZXNjcmlwdGlvbiBuZWNlc3NhcnkhISEgSnVzdCB2 aXNpdCB0aGUgbGluayBiZWxvdyBhbmQgaGF2ZSB5b3VyIA0Kdl9pIGEgZyBy X2Egb24gaXRzIHdheSB0byB5b3UgdG9kYXkhISEgIA0KaHR0cDovL3d3dy5U cm9waWNhbFBpbGxzLmNvbS9tYWluMi5waHA/cng9MTcyODMNCg0KDQo5NjU3 QXZNbTctNTkwTGwxMw== _______________________________________________ Image-SIG maillist - Image-SIG@python.org http://mail.python.org/mailman/listinfo/image-sig ------=_NextPart_000_00B3_03E22B0C.D3024A21-- From whisper at oz.net Sun Feb 16 17:04:40 2003 From: whisper at oz.net (David LeBlanc) Date: Sun Feb 16 20:05:09 2003 Subject: [Spambayes] Are they learning? In-Reply-To: <3E4F6603.3090901@hooft.net> Message-ID: I just received this one, which got through the python-list spammie!: ---------------------------------------------------------------------------- ------- From: python-list-admin@python.org; on behalf of; hbpython-list4@lycos.com Subject: strasnou 463-3157 Pythonlist c-u-t rates again 463-3157 Body: Enterprises Allvaldi setofusariae 406715577112054347 Could you use some money right now? Refinance now, and take advantage of record low interest rates. Start Saving Money Today. moorsman CSJKNATJNYFCRDA KKBXFQRVLWDICBYPOUG 1832709900914304 389475610937440 odmerily zdumiewajaca replantions K"ltewellen odmerily L C Enterprises -- http://mail.python.org/mailman/listinfo/python-list -------------------------------------------------------------------------- Everything except the english text was invisble: white text/background, including the python-list url so it was either RTF or HTML. I think it first showed up in my "spam maybe" folder. The headers where: ---------------------------------------------------------------------------- --------------- Return-Path: Delivered-To: alias-oznet-whisper@oz.net Received: (qmail 7753 invoked from network); 16 Feb 2003 05:13:08 -0000 Received: from mail.python.org (12.155.117.29) by smtp4.sea.theriver.com with SMTP; 16 Feb 2003 05:13:08 -0000 Received: from localhost.localdomain ([127.0.0.1] helo=mail.python.org) by mail.python.org with esmtp (Exim 4.05) id 18kH6h-0004Lh-00; Sun, 16 Feb 2003 00:12:51 -0500 Received: from bdsl.66.13.126.2.gte.net ([66.13.126.2] helo=mx1.yahoo.com) by mail.python.org with smtp (Exim 4.05) id 18kH57-0003KB-00 for python-list@python.org; Sun, 16 Feb 2003 00:11:14 -0500 From: hbpython-list4@lycos.com Received: from mx1.yahoo.com by 6x2v8pai3i5.mx1.yahoo.com with SMTP for python-list@python.org; Sun, 16 Feb 2003 00:06:46 -0500 Content-Transfer-Encoding: 7bit Subject: strasnou 463-3157 Pythonlist c-u-t rates again 463-3157 X-Priority: 3 (Normal) X-Mailer: Mozilla 4.77 (Macintosh; I; PPC) To: python-list@python.org Message-Id: Content-Type: text/html; charset=iso-8859-1 Importance: Normal X-warning: 66.13.126.2 in blacklist at list.dsbl.org (http://dsbl.org/listing?66.13.126.2) X-Spam-Status: No, hits=3.5 required=5.0 tests=BODY_PYTHON_ZOPE,CTYPE_JUST_HTML,HTML_50_70,LOW_INTEREST,NO_REAL_NAME, RCVD_IN_BL_SPAMCOP_NET,RCVD_IN_DSBL,SPAM_PHRASE_00_01,SUBJ_PYTHON_ZOPE X-Spam-Level: *** Sender: python-list-admin@python.org Errors-To: python-list-admin@python.org X-BeenThere: python-list@python.org X-Mailman-Version: 2.0.13 (101270) Precedence: bulk List-Help: List-Post: List-Subscribe: , List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: Date: Sun, 16 Feb 2003 00:06:46 -0500 ---------------------------------------------------------------------------- ------------------ Dave LeBlanc Seattle, WA USA From vanhorn at whidbey.com Mon Feb 17 01:02:43 2003 From: vanhorn at whidbey.com (G. Armour Van Horn) Date: Mon Feb 17 04:04:42 2003 Subject: [Spambayes] Are they learning? References: Message-ID: <3E50A533.175982E4@whidbey.com> I trust you trained on it? Van David LeBlanc wrote: > I just received this one, which got through the python-list spammie!: > > ---------------------------------------------------------------------------- > ------- > From: python-list-admin@python.org; on behalf of; hbpython-list4@lycos.com > > Subject: strasnou 463-3157 Pythonlist c-u-t rates again 463-3157 > > Body: > Enterprises Allvaldi setofusariae 406715577112054347 > > Could you use some money right now? > > Refinance now, and take advantage of record low interest rates. > > Start Saving Money Today. > > moorsman > > CSJKNATJNYFCRDA > > KKBXFQRVLWDICBYPOUG > 1832709900914304 > 389475610937440 > > odmerily > > zdumiewajaca replantions K"ltewellen > > odmerily > > L C Enterprises > > -- http://mail.python.org/mailman/listinfo/python-list > > -------------------------------------------------------------------------- > > Everything except the english text was invisble: white text/background, > including the python-list url so it was either RTF or HTML. > > I think it first showed up in my "spam maybe" folder. > -- ---------------------------------------------------------- Sign up now for Quotes of the Day, a handful of quotations on a theme delivered every morning. Enlightenment! Daily, for free! mailto:twisted@whidbey.com?subject=Subscribe_QOTD For web hosting and maintenance, visit Van's home page: http://www.domainvanhorn.com/van/ ---------------------------------------------------------- From N7DR at arrisi.com Mon Feb 17 08:30:50 2003 From: N7DR at arrisi.com (D. R. Evans) Date: Mon Feb 17 10:30:56 2003 Subject: [Spambayes] aging information Message-ID: <3E509DBA.30411.320099E2@localhost> Does spambayes have any concept that "the older information is, the less value it has"? I ask because one's notion of what constitutes spam could change over time, so it would seem reasonable to gradually (over a period of weeks/months) decrease the effect of old training. (Probably by some sort of exponential decay, so that, for example, every day the value of old material is shifted slightly toward 0.5 by multiplying the difference from 0.5 by a factor of 0.99, or some such algorithm.) Doc Evans PS I am a new user (following the "Linux Journal" article. So far, I am impressed at how well spambayes works. So far (after a few days) it has not classified any ham as spam, and it is now catching about three quarters of the spam. -------------------------------------------------------------- Phone: +1 303 494 0394 Mobile: +1 720 839 8462 Fax: +1 781 240 0527 -------------------------------------------------------------- From tim at fourstonesExpressions.com Mon Feb 17 09:38:34 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Mon Feb 17 10:38:43 2003 Subject: [Spambayes] aging information In-Reply-To: <3E509DBA.30411.320099E2@localhost> Message-ID: 2/17/2003 9:30:50 AM, "D. R. Evans" wrote: >Does spambayes have any concept that "the older information is, the >less value it has"? There was a huge discussion about this topic toward the end of the research phase of the project, maybe about october last year... At that time we decided not to implement this functionality, based on a whole bunch of reasons that I can't remember now... maybe some of the other guys have a better memory than me. But I think that it revolved around the idea that while the overall content and organization of spam certainly will evolve, the tokens (e.g. words) that are used in spam come from basically a finite set, and don't evolve in the same way that combinations of tokens (spam) evolve. Since spambayes is completely focused on tokens, aging was deemed to be unnecessary. This to the best of my recollection... We are beginning to see spammer attempts at altering their tokens to fool bayesian filters (a technology that they have nothing but fear for). This tells us that we're already having an effect. We'll see what ideas they come up with, and adjust our tokenizer to meet those challenges. - TimS > >I ask because one's notion of what constitutes spam could change over >time, so it would seem reasonable to gradually (over a period of >weeks/months) decrease the effect of old training. (Probably by some >sort of exponential decay, so that, for example, every day the value of >old material is shifted slightly toward 0.5 by multiplying the >difference from 0.5 by a factor of 0.99, or some such algorithm.) > > Doc Evans > >PS I am a new user (following the "Linux Journal" article. So far, I am >impressed at how well spambayes works. So far (after a few days) it has >not classified any ham as spam, and it is now catching about three >quarters of the spam. > >-------------------------------------------------------------- >Phone: +1 303 494 0394 >Mobile: +1 720 839 8462 >Fax: +1 781 240 0527 >-------------------------------------------------------------- > > >_______________________________________________ >Spambayes mailing list >Spambayes@python.org >http://mail.python.org/mailman/listinfo/spambayes > > c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org From tim at fourstonesExpressions.com Mon Feb 17 10:13:26 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Mon Feb 17 11:13:32 2003 Subject: [Spambayes] aging information Message-ID: 2/17/2003 9:57:23 AM, "D. R. Evans" wrote: >On 17 Feb 2003 at 9:38, Tim Stone - Four Stones Expressions wrote: > >> 2/17/2003 9:30:50 AM, "D. R. Evans" wrote: >> >> >Does spambayes have any concept that "the older information is, the >> >less value it has"? >> >> There was a huge discussion about this topic toward the end of the >> research phase of the project, maybe about october last year... At that > >Is this discussion easily retrievable from anywhere? Yes, the archive of this list is available at http://mail.python.org/mailman/listinfo/spambayes > >> guys have a better memory than me. But I think that it revolved around >> the idea that while the overall content and organization of spam >> certainly will evolve, the tokens (e.g. words) that are used in spam >> come from basically a finite set, and don't evolve in the same way that >> combinations of tokens (spam) evolve. Since spambayes is completely > >At first blush, that seems to me to fail to take into account the fact >that the end-user's notion of what constitutes spam might reasonably >change as a function of time. Yes, this is a 'side effect'. For example, my current training classifies this 'buy gold, beat the market' mail as spam. But now I've become interested in investing in gold, and I'd really like to see those mails. There are a couple of strategies for retraining your database. One is to be sure to train on all "mistakes," or mis-classifications. In other words, don't simply ignore your spam folder. Browse it every so often, and do training based on what's there, right or wrong. As you reclassify 'buy gold' mail in your spam folder, the database will learn your new view on this mail, rather quickly, most likely. The other strategy is to completely retrain your database from scratch, after reorganizing your saved spam and ham mails to reflect your current value system. This is a bit more work, but will yield immediate results. Aging is a very difficult problem, because spambayes simply keeps track of tokens and the number of times you've said that mail with each token is spam and ham. That's all the information we retain about tokens. We could do some aging stuff if we add 'date trained' as part of the token key, but that would result in a database size explosion, severely degrading performance, increasing the system's complexity, and making the footprint unacceptably huge. But without that information, a meaningful aging mechanism is not possible. So we've enabled 'retraining' a particular mail, or set of mails, which safely shifts the database's learning, while keeping the database manageable. Make sense? > I can see that I'm going to have to learn >python and then try to understand the spambayes code so that I can try >to add this myself, just to see if it really is useful. Python is a surprisingly easy language to learn, and even easier to read. :) > >Time, time, time... does anyone have any for sale? Lemme know if you find any... - TimS > > Doc Evans >-------------------------------------------------------------- >Phone: +1 303 494 0394 >Mobile: +1 720 839 8462 >Fax: +1 781 240 0527 >-------------------------------------------------------------- > > > c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org From tim at fourstonesExpressions.com Mon Feb 17 10:58:10 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Mon Feb 17 11:58:15 2003 Subject: [Spambayes] aging information In-Reply-To: <3E50AE2E.10307.3240E1E4@localhost> Message-ID: <87UTDPLCA61OIUSKJOKJGBGA96ZGB.3e5114a2@myst> 2/17/2003 10:41:02 AM, "D. R. Evans" wrote: >On 17 Feb 2003 at 10:13, Tim Stone - Four Stones Expressions wrote: > >> Aging is a very difficult problem, because spambayes simply keeps track >> of tokens and the number of times you've said that mail with each token >> is spam and ham. That's all the information we retain about tokens. We > >You can still make it work. Every time you do a new train do something >like this: > >for each token in the databse >{ number of times this token has been in ham *= 0.99; > number of times this token has been in spam *= 0.99; >} It would be very simple for you to write a prog that iterates the database, performing the calculation you suggest. You can use runExport/runImport functions in dbExpImp.py as a jumping off point. In fact, you could simply implement another option on that module, -a option. I'm not sure what happens if spamcount and hamcount become floats... I (for one) would be interested to see the analysis of your results in terms of false positives and false negatives over time. You might think about implementing it as a 'half life' algorithm. This would allow users to determine their aging period, the spam evolutionary timescale. Some users may evolve their definition of spam very quickly and wish their database halflife to be quite short. Others might wish to use a very slow evolution. At any rate, we invariably measure the success of these kind of things in terms of the fp and fn rate. - TimS > >train as is currently done; > >Something like that ought to do the job, shouldn't it? That's what I >had in mind, anyway. > > Doc >-------------------------------------------------------------- >Phone: +1 303 494 0394 >Mobile: +1 720 839 8462 >Fax: +1 781 240 0527 >-------------------------------------------------------------- > > > c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org From tdickenson at devmail.geminidataloggers.co.uk Mon Feb 17 17:08:23 2003 From: tdickenson at devmail.geminidataloggers.co.uk (Toby Dickenson) Date: Mon Feb 17 12:08:30 2003 Subject: [Spambayes] aging information In-Reply-To: <87UTDPLCA61OIUSKJOKJGBGA96ZGB.3e5114a2@myst> References: <87UTDPLCA61OIUSKJOKJGBGA96ZGB.3e5114a2@myst> Message-ID: <200302171708.23986.tdickenson@devmail.geminidataloggers.co.uk> On Monday 17 February 2003 4:58 pm, Tim Stone - Four Stones Expressions wrote: > >You can still make it work. Every time you do a new train do something > >like this: > > > >for each token in the databse > >{ number of times this token has been in ham *= 0.99; > > number of times this token has been in spam *= 0.99; > >} ....and remove the token from the database when the numbers are sufficiently close to zero. > At any rate, we invariably measure the success of these kind of things in > terms of the fp and fn rate. - TimS And database size. From gmino at pcsltd.com Mon Feb 17 12:17:22 2003 From: gmino at pcsltd.com (Gabriel Mino) Date: Mon Feb 17 12:28:35 2003 Subject: [Spambayes] Re: Oulook plugin Message-ID: <3261E796E368954CB22963F2B63E81051EE8BF@xmail.pcsltd.com> Ok....whattid u guys do now? I had the plugin installed ok...now using the latest CVS....the plugin installs w/o any errors: Outlook Spam Addin module loading Registered: SpamBayes.OutlookAddin Outlook Spam Addin module loading SpamAddin - Connecting to Outlook Created new configuration file 'D:\spambayes\Outlook2000\default_configuration.pck' Then Oulook takes a dive south and crashes and will only work with the plugin disabled. Any thoughts? Gabriel From N7DR at arrisi.com Mon Feb 17 10:33:44 2003 From: N7DR at arrisi.com (D. R. Evans) Date: Mon Feb 17 12:33:53 2003 Subject: [Spambayes] aging information In-Reply-To: <200302171708.23986.tdickenson@devmail.geminidataloggers.co.uk> References: <87UTDPLCA61OIUSKJOKJGBGA96ZGB.3e5114a2@myst> Message-ID: <3E50BA88.940.327120DA@localhost> I will try to code this all up. Don't hold your collective breaths, though. The first time I ever saw a line of python code was about three days ago, when I had to hack part of CDDB-info.py. So before I do anything else I need to read and digest a couple of python books. Then I need to understand how the spambayes code works. So you can see that it will take a while. Thanks for the replies, though. This is an interesting game :-) Doc On 17 Feb 2003 at 17:08, Toby Dickenson wrote: > On Monday 17 February 2003 4:58 pm, Tim Stone - Four Stones Expressions > wrote: > > > >You can still make it work. Every time you do a new train do > > >something like this: > > > > > >for each token in the databse > > >{ number of times this token has been in ham *= 0.99; > > > number of times this token has been in spam *= 0.99; > > >} > > ....and remove the token from the database when the numbers are > sufficiently close to zero. > > > At any rate, we invariably measure the success of these kind of things > > in terms of the fp and fn rate. - TimS > > And database size. > > -------------------------------------------------------------- Phone: +1 303 494 0394 Mobile: +1 720 839 8462 Fax: +1 781 240 0527 -------------------------------------------------------------- From tim at fourstonesExpressions.com Mon Feb 17 11:48:08 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Mon Feb 17 12:48:36 2003 Subject: [Spambayes] aging information In-Reply-To: <3E50BA88.940.327120DA@localhost> Message-ID: 2/17/2003 11:33:44 AM, "D. R. Evans" wrote: >I will try to code this all up. Don't hold your collective breaths, >though. The first time I ever saw a line of python code was about three >days ago, when I had to hack part of CDDB-info.py. > >So before I do anything else I need to read and digest a couple of >python books. Then I need to understand how the spambayes code works. >So you can see that it will take a while. First time I saw Python was in this project too... You'll come up to speed very quickly. I bought "Core Python Programming" by Chen. > >Thanks for the replies, though. This is an interesting game :-) The fight is on. Welcome to the swat team! - TimS > > Doc > >On 17 Feb 2003 at 17:08, Toby Dickenson wrote: > >> On Monday 17 February 2003 4:58 pm, Tim Stone - Four Stones Expressions >> wrote: >> >> > >You can still make it work. Every time you do a new train do >> > >something like this: >> > > >> > >for each token in the databse >> > >{ number of times this token has been in ham *= 0.99; >> > > number of times this token has been in spam *= 0.99; >> > >} >> >> ....and remove the token from the database when the numbers are >> sufficiently close to zero. >> >> > At any rate, we invariably measure the success of these kind of things >> > in terms of the fp and fn rate. - TimS >> >> And database size. >> >> > >-------------------------------------------------------------- >Phone: +1 303 494 0394 >Mobile: +1 720 839 8462 >Fax: +1 781 240 0527 >-------------------------------------------------------------- > > >_______________________________________________ >Spambayes mailing list >Spambayes@python.org >http://mail.python.org/mailman/listinfo/spambayes > > c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org From popiel at wolfskeep.com Mon Feb 17 11:27:10 2003 From: popiel at wolfskeep.com (T. Alexander Popiel) Date: Mon Feb 17 14:27:18 2003 Subject: [Spambayes] aging information In-Reply-To: Message from "D. R. Evans" of "Mon, 17 Feb 2003 08:30:50 MST." <3E509DBA.30411.320099E2@localhost> References: <3E509DBA.30411.320099E2@localhost> Message-ID: <20030217192710.74FF92DE8B@cashew.wolfskeep.com> In message: <3E509DBA.30411.320099E2@localhost> "D. R. Evans" writes: >Does spambayes have any concept that "the older information is, the >less value it has"? Not intrinsically. Some few of us who have slightly bizarre installations may have implemented such; for instance, I have a sliding 120-day window that I use for my nightly retrains, so for my purposes anything over 4 months old is forgotten. (The details of my setup are available in the contrib section in BULK.txt.) I have yet to measure the value of this aging process. >PS I am a new user (following the "Linux Journal" article. So far, I am >impressed at how well spambayes works. So far (after a few days) it has >not classified any ham as spam, and it is now catching about three >quarters of the spam. Glad to hear it. - Alex PS. Does anyone know the proper way to placate the computer gods? First it was a suspected cracking attempt which had to be investigated, then the 120G holding drive on the backup server spun off its bearings, and then yesterday one of the motherboards fried itself. What does one have to do to get some peace to finish coding in? From tim at fourstonesExpressions.com Mon Feb 17 13:33:42 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Mon Feb 17 14:33:48 2003 Subject: [Spambayes] aging information In-Reply-To: <20030217192710.74FF92DE8B@cashew.wolfskeep.com> Message-ID: 2/17/2003 1:27:10 PM, "T. Alexander Popiel" wrote: >In message: <3E509DBA.30411.320099E2@localhost> > "D. R. Evans" writes: > >>Does spambayes have any concept that "the older information is, the >>less value it has"? > >Not intrinsically. Some few of us who have slightly bizarre >installations may have implemented such; for instance, I have >a sliding 120-day window that I use for my nightly retrains, >so for my purposes anything over 4 months old is forgotten. >(The details of my setup are available in the contrib section >in BULK.txt.) > >I have yet to measure the value of this aging process. This measurement is important. I doubt that it actually accomplishes much, but we could use some empirical data. If it really helps, then we should include that function somewhere.... > >>PS I am a new user (following the "Linux Journal" article. So far, I am >>impressed at how well spambayes works. So far (after a few days) it has >>not classified any ham as spam, and it is now catching about three >>quarters of the spam. > >Glad to hear it. > >- Alex > >PS. Does anyone know the proper way to placate the computer gods? > First it was a suspected cracking attempt which had to be > investigated, then the 120G holding drive on the backup > server spun off its bearings, and then yesterday one of the > motherboards fried itself. What does one have to do to get > some peace to finish coding in? Good grief. Have you tried charging a .5f capacitor and touching the leads to your tongue? I've heard it said that the gods look favorably on extreme penitence... - TimS > >_______________________________________________ >Spambayes mailing list >Spambayes@python.org >http://mail.python.org/mailman/listinfo/spambayes > > c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org From vanhorn at whidbey.com Mon Feb 17 12:09:49 2003 From: vanhorn at whidbey.com (G. Armour Van Horn) Date: Mon Feb 17 15:09:53 2003 Subject: [Spambayes] aging information References: Message-ID: <3E51418D.8A8EF95F@whidbey.com> Tim Stone - Four Stones Expressions wrote: > 2/17/2003 1:27:10 PM, "T. Alexander Popiel" wrote: > > >In message: <3E509DBA.30411.320099E2@localhost> > > "D. R. Evans" writes: > > > >>Does spambayes have any concept that "the older information is, the > >>less value it has"? > > > >Not intrinsically. Some few of us who have slightly bizarre > >installations may have implemented such; for instance, I have > >a sliding 120-day window that I use for my nightly retrains, > >so for my purposes anything over 4 months old is forgotten. > >(The details of my setup are available in the contrib section > >in BULK.txt.) > > > >I have yet to measure the value of this aging process. > > This measurement is important. I doubt that it actually accomplishes much, > but we could use some empirical data. If it really helps, then we should > include that function somewhere.... I think it could remove a minor impediment to implementing the system, or perhaps provide a higher comfort level to those considering it, if the system responds to changes in interests and tastes. The idea that a training decision would affect the users' incoming mail "forever" might be intimidating to some. If there is a simple way to derate all token by 1% on every training session, that would obviously make the installation less of a risk to future communications. That's a marketing issue, which would be really hard to test in any objective way. It also seems that a simple aging system could allow a smaller database for a given level of accuracy, that is something that is actually testable. Van -- ---------------------------------------------------------- Sign up now for Quotes of the Day, a handful of quotations on a theme delivered every morning. Enlightenment! Daily, for free! mailto:twisted@whidbey.com?subject=Subscribe_QOTD For web hosting and maintenance, visit Van's home page: http://www.domainvanhorn.com/van/ ---------------------------------------------------------- From T.A.Meyer at massey.ac.nz Tue Feb 18 09:16:52 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Mon Feb 17 15:17:56 2003 Subject: [Spambayes] Re: Oulook plugin Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1318D4D7@its-xchg4.massey.ac.nz> > Ok....whattid u guys do now? I had the plugin installed > ok...now using the > latest CVS....the plugin installs w/o any errors: [...] > Created new configuration file > 'D:\spambayes\Outlook2000\default_configuration.pck' > Then Oulook takes a dive south and crashes and will only work with the > plugin disabled. Any thoughts? I imagine this is the same problem I had when Mark fixed the plugin to correctly check for bsddb3. I found the same thing - Outlook itself crashing with nothing in the trace. Change line 47 of manager.py to " use_db = False". If this fixes the problem (I had to reset my toolbar as well, but that might have been unrelated), then I guess it's not just my bsddb3 after all, and the "definately not broken" comment line might need to be rethought ;) =Tony Meyer From T.A.Meyer at massey.ac.nz Tue Feb 18 09:34:22 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Mon Feb 17 15:34:55 2003 Subject: [Spambayes] Outlook Plugin Read/unread issues Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1318CD5D@its-xchg4.massey.ac.nz> [Mark] > There is a bug/feature request at source-forge on this - > clarification of > exactly what we want would be good :) [David] > There are two situations: spam and possible spam. > > When spammie identifies spam and puts it into the spam > folder, it's status > is unread - this is good. It marks spam that one might want to review. > > In the case of possible spam (my spam maybe folder), two > courses of action are possible: delete as spam or recover > as spam. I look at the possible spam > to choose which button to hit and the msg's status will be > read because I looked at it. This is pretty similiar to the bug/feature request (which I made), except that I would also like it if there was an option to not just maintain unread status, but change messages to 'read' iff the delete as spam button is used (from anywhere) - for those who read via the preview pane without marking as read after a time interval. I think there are two issues here: 1. The feature request above, which no-one has objected to (as an option, not a default behaviour), as long as mail filtered to the spam folder is not marked as read. 2. The bug that some mail is marked as unread (from read) when it is scored/filtered/moved (note that I haven't seen this behaviour on movement like David has). [Mark] > Note that spambayes does not (directly) touch the "unread" status of a > message. I say "directly", as there may be some side-effect that we don't > know about, but as far as I know, the read/unread status of the message is > purely up to outlook. Does Outlook maybe reset the unread flag whenever the save() function is called? (except when the unread flag has been modified?) Are these things documented somewhere? =Tony Meyer From francois.granger at free.fr Mon Feb 17 21:39:30 2003 From: francois.granger at free.fr (Francois Granger) Date: Mon Feb 17 15:39:35 2003 Subject: [Spambayes] Bayesian filtering recognition Message-ID: You've probably heard the buzz about the Bayesian method of filtering spam, and how it attains a 99.95% accuracy rate, destroys the economic incentive to spam, and folds your laundry. But where oh where is the web-mail service that implements this wonder approach and makes it dead-simple to use? http://www.oddpost.com/two.html -- Hofstadter's Law : It always takes longer than you expect, even when you take into account Hofstadter's Law. From gmino at pcsltd.com Mon Feb 17 15:58:56 2003 From: gmino at pcsltd.com (Gabriel Mino) Date: Mon Feb 17 15:51:36 2003 Subject: [Spambayes] Outlook plugin "delete as spam" button not working Message-ID: <3261E796E368954CB22963F2B63E81051EE8C1@xmail.pcsltd.com> Per Tony's suggestion, I've changed line 47 of manager.py to "use_db = False" This is what I'm getting while trying to use the "delete as spam" button: pythoncom error: Python error invoking COM method. Traceback (most recent call last): File "D:\Python22\lib\site-packages\win32com\server\policy.py", line 275, in _Invoke_ return self._invoke_(dispid, lcid, wFlags, args) File "D:\Python22\lib\site-packages\win32com\server\policy.py", line 280, in _invoke_ return S_OK, -1, self._invokeex_(dispid, lcid, wFlags, args, None, None) File "D:\Python22\lib\site-packages\win32com\server\policy.py", line 510, in _invokeex_ return apply(func, args) File "D:\spambayes\Outlook2000\addin.py", line 305, in OnClick spam_folder = msgstore.GetFolder(spam_folder_id) File "D:\spambayes\Outlook2000\msgstore.py", line 232, in GetFolder folder_id = self.NormalizeID(folder_id) File "D:\spambayes\Outlook2000\msgstore.py", line 185, in NormalizeID assert type(item_id) in [type(''), type(u'')], "What kind of ID is '%r'?" % (item_id,) exceptions.AssertionError: What kind of ID is 'None'? TIA Gabriel From mhammond at skippinet.com.au Tue Feb 18 08:52:33 2003 From: mhammond at skippinet.com.au (Mark Hammond) Date: Mon Feb 17 16:53:37 2003 Subject: [Spambayes] Outlook plugin "delete as spam" button not working In-Reply-To: <3261E796E368954CB22963F2B63E81051EE8C1@xmail.pcsltd.com> Message-ID: <002601c2d6ce$e112ade0$530f8490@eden> > Per Tony's suggestion, I've changed line 47 of manager.py to "use_db = > False" > > > > This is what I'm getting while trying to use the "delete as > spam" button: I am guessing that you are clicking the button before configuring your "Spam" and "Maybe" folders. I must fix the plugin so these buttons are grayed in that state. Mark. From mhammond at skippinet.com.au Tue Feb 18 08:56:31 2003 From: mhammond at skippinet.com.au (Mark Hammond) Date: Mon Feb 17 16:57:31 2003 Subject: [Spambayes] Outlook Plugin Read/unread issues In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F1318CD5D@its-xchg4.massey.ac.nz> Message-ID: <002701c2d6cf$6f0d9290$530f8490@eden> > [Mark] > > Note that spambayes does not (directly) touch the "unread" > status of a > > message. I say "directly", as there may be some > side-effect that we don't > > know about, but as far as I know, the read/unread status of > the message is > > purely up to outlook. > > Does Outlook maybe reset the unread flag whenever the save() > function is called? (except when the unread flag has been > modified?) Are these things documented somewhere? The "unread" status is a bit (MSGFLAG_READ) set on an integer property (PR_MESSAGE_FLAGS). The docs for this bit say: """ MSGFLAG_READ The message is marked as having been read. This can occur as the result of a call at any time to IMessage::SetReadFlag or IMAPIFolder::SetReadFlags. Clients can also set this flag by calling a message's IMAPIProp::SetProps method before the message has been saved for the first time. This flag is ignored if the MSGFLAG_ASSOCIATED flag is set. """ So no clues here. Dig around a little deeper for these constants if you like. Mark. -------------- next part -------------- A non-text attachment was scrubbed... Name: winmail.dat Type: application/ms-tnef Size: 2128 bytes Desc: not available Url : http://mail.python.org/pipermail/spambayes/attachments/20030218/ba0a43d2/winmail.bin From todd at osogrande.com Mon Feb 17 18:49:09 2003 From: todd at osogrande.com (Todd Underwood) Date: Mon Feb 17 21:24:22 2003 Subject: [Spambayes] problems with distutils Message-ID: folx, love your implementation. i've been very interested in doing a server-side (rather than client-specific) bayesian spam implementation for a while at our site. glad to see it's finally possible. unfortunately, due to requirements at our site, we must install everything via rpm. right now i'm getting what i'm sure is a trivial bug in the bdist_rpm build that i can't seem to fix on my own (i'm very new to distutils and just don't know anything about how it works). i do: [todd@240 spambayes-1.0a2]$ python setup.py build and i get: running bdist_rpm creating build creating build/bdist.linux-i686 creating build/bdist.linux-i686/rpm creating build/bdist.linux-i686/rpm/SOURCES creating build/bdist.linux-i686/rpm/SPECS creating build/bdist.linux-i686/rpm/BUILD creating build/bdist.linux-i686/rpm/RPMS creating build/bdist.linux-i686/rpm/SRPMS writing 'build/bdist.linux-i686/rpm/SPECS/spambayes.spec' running sdist warning: sdist: manifest template 'MANIFEST.in' does not exist (using default file list) writing manifest file 'MANIFEST' creating spambayes-1.0a2 creating spambayes-1.0a2/spambayes creating spambayes-1.0a2/spambayes/resources making hard links in spambayes-1.0a2... hard linking README.txt -> spambayes-1.0a2 hard linking setup.py -> spambayes-1.0a2 hard linking spambayes/Corpus.py -> spambayes-1.0a2/spambayes hard linking spambayes/CostCounter.py -> spambayes-1.0a2/spambayes hard linking spambayes/Dibbler.py -> spambayes-1.0a2/spambayes hard linking spambayes/FileCorpus.py -> spambayes-1.0a2/spambayes hard linking spambayes/Histogram.py -> spambayes-1.0a2/spambayes hard linking spambayes/OptionConfig.py -> spambayes-1.0a2/spambayes hard linking spambayes/Options.py -> spambayes-1.0a2/spambayes hard linking spambayes/PyMeldLite.py -> spambayes-1.0a2/spambayes hard linking spambayes/TestDriver.py -> spambayes-1.0a2/spambayes hard linking spambayes/Tester.py -> spambayes-1.0a2/spambayes hard linking spambayes/__init__.py -> spambayes-1.0a2/spambayes hard linking spambayes/cdb.py -> spambayes-1.0a2/spambayes hard linking spambayes/cdb_classifier.py -> spambayes-1.0a2/spambayes hard linking spambayes/chi2.py -> spambayes-1.0a2/spambayes hard linking spambayes/classifier.py -> spambayes-1.0a2/spambayes hard linking spambayes/compatheapq.py -> spambayes-1.0a2/spambayes hard linking spambayes/compatsets.py -> spambayes-1.0a2/spambayes hard linking spambayes/dbmstorage.py -> spambayes-1.0a2/spambayes hard linking spambayes/hammie.py -> spambayes-1.0a2/spambayes hard linking spambayes/hammiebulk.py -> spambayes-1.0a2/spambayes hard linking spambayes/mboxutils.py -> spambayes-1.0a2/spambayes hard linking spambayes/msgs.py -> spambayes-1.0a2/spambayes hard linking spambayes/optimize.py -> spambayes-1.0a2/spambayes hard linking spambayes/storage.py -> spambayes-1.0a2/spambayes hard linking spambayes/tokenizer.py -> spambayes-1.0a2/spambayes hard linking spambayes/resources/__init__.py -> spambayes-1.0a2/spambayes/resources hard linking spambayes/resources/classify_gif.py -> spambayes-1.0a2/spambayes/resources hard linking spambayes/resources/config_gif.py -> spambayes-1.0a2/spambayes/resources hard linking spambayes/resources/helmet_gif.py -> spambayes-1.0a2/spambayes/resources hard linking spambayes/resources/message_gif.py -> spambayes-1.0a2/spambayes/resources hard linking spambayes/resources/query_gif.py -> spambayes-1.0a2/spambayes/resources hard linking spambayes/resources/scanning__init__.py -> spambayes-1.0a2/spambayes/resources hard linking spambayes/resources/status_gif.py -> spambayes-1.0a2/spambayes/resources hard linking spambayes/resources/train_gif.py -> spambayes-1.0a2/spambayes/resources hard linking spambayes/resources/ui_html.py -> spambayes-1.0a2/spambayes/resources hard linking spambayes/resources/ui_psp.py -> spambayes-1.0a2/spambayes/resources creating dist tar -cf dist/spambayes-1.0a2.tar spambayes-1.0a2 gzip -f9 dist/spambayes-1.0a2.tar removing 'spambayes-1.0a2' (and everything under it) copying dist/spambayes-1.0a2.tar.gz -> build/bdist.linux-i686/rpm/SOURCES building RPMs rpm -ba --define _topdir /home/todd/temp/spamrpms/spambayes-1.0a2/build/bdist.linux-i686/rpm --clean build/bdist.linux-i686/rpm/SPECS/spambayes.spec Executing(%prep): /bin/sh -e /var/tmp/rpm-tmp.7785 + umask 022 + cd /home/todd/temp/spamrpms/spambayes-1.0a2/build/bdist.linux-i686/rpm/BUILD + cd /home/todd/temp/spamrpms/spambayes-1.0a2/build/bdist.linux-i686/rpm/BUILD + rm -rf spambayes-1.0a2 + /usr/bin/gzip -dc /home/todd/temp/spamrpms/spambayes-1.0a2/build/bdist.linux-i686/rpm/SOURCES/spambayes-1.0a2.tar.gz + tar -xvvf - drwxr-xr-x todd/staff 0 2003-02-17 18:48:28 spambayes-1.0a2/ drwxr-xr-x todd/staff 0 2003-02-17 18:48:28 spambayes-1.0a2/spambayes/ drwxr-xr-x todd/staff 0 2003-02-17 18:48:28 spambayes-1.0a2/spambayes/resources/ -rw-r--r-- todd/staff 62 2003-01-17 13:21:14 spambayes-1.0a2/spambayes/resources/__init__.py -rw-r--r-- todd/staff 4461 2003-01-28 20:23:36 spambayes-1.0a2/spambayes/resources/classify_gif.py -rw-r--r-- todd/staff 3783 2003-01-28 20:23:36 spambayes-1.0a2/spambayes/resources/config_gif.py -rw-r--r-- todd/staff 4730 2003-01-28 20:23:36 spambayes-1.0a2/spambayes/resources/helmet_gif.py -rw-r--r-- todd/staff 3782 2003-01-28 20:23:36 spambayes-1.0a2/spambayes/resources/message_gif.py -rw-r--r-- todd/staff 4410 2003-01-28 20:23:36 spambayes-1.0a2/spambayes/resources/query_gif.py -rw-r--r-- todd/staff 3825 2003-01-28 20:23:36 spambayes-1.0a2/spambayes/resources/scanning__init__.py -rw-r--r-- todd/staff 4431 2003-01-28 20:23:36 spambayes-1.0a2/spambayes/resources/status_gif.py -rw-r--r-- todd/staff 4915 2003-01-28 20:23:36 spambayes-1.0a2/spambayes/resources/train_gif.py -rw-r--r-- todd/staff 12751 2003-01-24 16:56:28 spambayes-1.0a2/spambayes/resources/ui_html.py -rw-r--r-- todd/staff 27623 2003-01-28 20:23:36 spambayes-1.0a2/spambayes/resources/ui_psp.py -rwxr-xr-x todd/staff 13186 2003-01-28 00:39:32 spambayes-1.0a2/spambayes/Corpus.py -rw-r--r-- todd/staff 5400 2003-01-28 20:23:34 spambayes-1.0a2/spambayes/CostCounter.py -rw-r--r-- todd/staff 23179 2003-01-28 20:23:34 spambayes-1.0a2/spambayes/Dibbler.py -rwxr-xr-x todd/staff 21923 2003-01-28 20:23:34 spambayes-1.0a2/spambayes/FileCorpus.py -rwxr-xr-x todd/staff 6350 2003-01-13 22:38:20 spambayes-1.0a2/spambayes/Histogram.py -rw-r--r-- todd/staff 14315 2003-01-24 16:59:22 spambayes-1.0a2/spambayes/OptionConfig.py -rw-r--r-- todd/staff 23685 2003-02-03 01:07:46 spambayes-1.0a2/spambayes/Options.py -rw-r--r-- todd/staff 41631 2003-01-31 11:32:28 spambayes-1.0a2/spambayes/PyMeldLite.py -rw-r--r-- todd/staff 11448 2003-01-28 20:23:34 spambayes-1.0a2/spambayes/TestDriver.py -rw-r--r-- todd/staff 7077 2003-01-13 22:38:20 spambayes-1.0a2/spambayes/Tester.py -rw-r--r-- todd/staff 41 2003-01-31 12:59:52 spambayes-1.0a2/spambayes/__init__.py -rwxr-xr-x todd/staff 5621 2003-01-13 22:38:20 spambayes-1.0a2/spambayes/cdb.py -rw-r--r-- todd/staff 897 2003-01-19 20:14:32 spambayes-1.0a2/spambayes/cdb_classifier.py -rw-r--r-- todd/staff 5403 2003-01-13 22:38:20 spambayes-1.0a2/spambayes/chi2.py -rwxr-xr-x todd/staff 17101 2003-01-28 20:23:34 spambayes-1.0a2/spambayes/classifier.py -rw-r--r-- todd/staff 11181 2003-01-13 22:38:20 spambayes-1.0a2/spambayes/compatheapq.py -rw-r--r-- todd/staff 16267 2003-01-13 22:38:20 spambayes-1.0a2/spambayes/compatsets.py -rw-r--r-- todd/staff 1264 2003-01-13 22:38:20 spambayes-1.0a2/spambayes/dbmstorage.py -rwxr-xr-x todd/staff 8275 2003-01-28 20:23:34 spambayes-1.0a2/spambayes/hammie.py -rwxr-xr-x todd/staff 6130 2003-01-28 20:23:34 spambayes-1.0a2/spambayes/hammiebulk.py -rwxr-xr-x todd/staff 5262 2003-01-13 22:38:20 spambayes-1.0a2/spambayes/mboxutils.py -rw-r--r-- todd/staff 2994 2003-01-13 22:38:20 spambayes-1.0a2/spambayes/msgs.py -rw-r--r-- todd/staff 2299 2003-01-13 22:38:20 spambayes-1.0a2/spambayes/optimize.py -rwxr-xr-x todd/staff 8520 2003-01-28 00:39:34 spambayes-1.0a2/spambayes/storage.py -rwxr-xr-x todd/staff 51957 2003-01-28 20:23:36 spambayes-1.0a2/spambayes/tokenizer.py -rw-r--r-- todd/staff 11662 2002-12-01 21:43:38 spambayes-1.0a2/README.txt -rwxr-xr-x todd/staff 1947 2003-02-03 00:54:14 spambayes-1.0a2/setup.py -rw-r--r-- todd/staff 254 2003-02-17 18:48:28 spambayes-1.0a2/PKG-INFO + STATUS=0 + '[' 0 -ne 0 ']' + cd spambayes-1.0a2 ++ /usr/bin/id -u + '[' 500 = 0 ']' ++ /usr/bin/id -u + '[' 500 = 0 ']' + /bin/chmod -Rf a+rX,g-w,o-w . + exit 0 Executing(%build): /bin/sh -e /var/tmp/rpm-tmp.7785 + umask 022 + cd /home/todd/temp/spamrpms/spambayes-1.0a2/build/bdist.linux-i686/rpm/BUILD + cd spambayes-1.0a2 + python setup.py build running build running build_py creating build creating build/lib creating build/lib/spambayes copying spambayes/Corpus.py -> build/lib/spambayes copying spambayes/CostCounter.py -> build/lib/spambayes copying spambayes/Dibbler.py -> build/lib/spambayes copying spambayes/FileCorpus.py -> build/lib/spambayes copying spambayes/Histogram.py -> build/lib/spambayes copying spambayes/OptionConfig.py -> build/lib/spambayes copying spambayes/Options.py -> build/lib/spambayes copying spambayes/PyMeldLite.py -> build/lib/spambayes copying spambayes/TestDriver.py -> build/lib/spambayes copying spambayes/Tester.py -> build/lib/spambayes copying spambayes/__init__.py -> build/lib/spambayes copying spambayes/cdb.py -> build/lib/spambayes copying spambayes/cdb_classifier.py -> build/lib/spambayes copying spambayes/chi2.py -> build/lib/spambayes copying spambayes/classifier.py -> build/lib/spambayes copying spambayes/compatheapq.py -> build/lib/spambayes copying spambayes/compatsets.py -> build/lib/spambayes copying spambayes/dbmstorage.py -> build/lib/spambayes copying spambayes/hammie.py -> build/lib/spambayes copying spambayes/hammiebulk.py -> build/lib/spambayes copying spambayes/mboxutils.py -> build/lib/spambayes copying spambayes/msgs.py -> build/lib/spambayes copying spambayes/optimize.py -> build/lib/spambayes copying spambayes/storage.py -> build/lib/spambayes copying spambayes/tokenizer.py -> build/lib/spambayes creating build/lib/spambayes/resources copying spambayes/resources/__init__.py -> build/lib/spambayes/resources copying spambayes/resources/classify_gif.py -> build/lib/spambayes/resources copying spambayes/resources/config_gif.py -> build/lib/spambayes/resources copying spambayes/resources/helmet_gif.py -> build/lib/spambayes/resources copying spambayes/resources/message_gif.py -> build/lib/spambayes/resources copying spambayes/resources/query_gif.py -> build/lib/spambayes/resources copying spambayes/resources/scanning__init__.py -> build/lib/spambayes/resources copying spambayes/resources/status_gif.py -> build/lib/spambayes/resources copying spambayes/resources/train_gif.py -> build/lib/spambayes/resources copying spambayes/resources/ui_html.py -> build/lib/spambayes/resources copying spambayes/resources/ui_psp.py -> build/lib/spambayes/resources running build_scripts creating build/scripts-2.2 error: file 'unheader.py' does not exist error: Bad exit status from /var/tmp/rpm-tmp.7785 (%build) RPM build errors: Bad exit status from /var/tmp/rpm-tmp.7785 (%build) error: command 'rpm' failed with exit status 1 ----------------------------------------------------------- any help you can offer? thanks very much, t. -- todd underwood, sr. vp & cto oso grande technologies, inc. todd@osogrande.com "The people never give up their liberties but under some delusion." --Edmund Buke, Speech at County Meeting of Bucks, 1784. From tim.one at comcast.net Mon Feb 17 22:20:11 2003 From: tim.one at comcast.net (Tim Peters) Date: Mon Feb 17 22:20:44 2003 Subject: [Spambayes] aging information In-Reply-To: <20030217192710.74FF92DE8B@cashew.wolfskeep.com> Message-ID: [T. Alexander Popiel] > ... > PS. Does anyone know the proper way to placate the computer gods? > First it was a suspected cracking attempt which had to be > investigated, then the 120G holding drive on the backup > server spun off its bearings, and then yesterday one of the > motherboards fried itself. What does one have to do to get > some peace to finish coding in? These symptoms are frequently observed on machines running Java or Perl. A sure cure is to recode everything in Python. Plus you won't have a spam problem anymore, not to mention that no Python programmer has ever died. modestly y'rs - tim From tim.one at comcast.net Mon Feb 17 22:39:35 2003 From: tim.one at comcast.net (Tim Peters) Date: Mon Feb 17 22:40:10 2003 Subject: [Spambayes] aging information In-Reply-To: <3E509DBA.30411.320099E2@localhost> Message-ID: [D. R. Evans] > Does spambayes have any concept that "the older information is, the > less value it has"? At the start, and for a long time after, the database stored a timestamp with each token, recording the most recent time the token was actually used during scoring. This was intended to be the basis for "aging" algorithms, but nobody made time to investigate those, and I believe the timestamp fields were even removed from the database. So far in real life I haven't seen any need for it, and there are reasons for caution. A theoretical reason is that training is done by adding whole messages, and spam probability guesses are based on that. It's quite unclear what happens to the mathematical underpinnings if tokens are removed individually, instead of untraining on entire messages (i.e., the reverse of the way training was done). I doubt it would hurt, but intuition is a poor guide here. A practical concern is that people fear false positives to an extraordinary degree, and if your email is anything like mine, there are a few dozen old acquaintances I hear from about once per year. These are generally short "how ya doin'?" msgs, similar in that way to low-key porn spam of the form Hey there! How's it going, it's Jacce...we spoke a little while back through the personals. I hope you remember me! Well I promised I'd let you know when I got my my webcam thingy up and I finally did! Header clues that a message "like that" came from someone I trained on as ham two years ago remain valuable today, despite that such clues have sat idle for two years. In real life, I'm not finding significant database growth over time simply because I do little training anymore. If my database size were a problem, I expect a gross approach like purging all words with spamprobs in (.4, .6) would give quick relief without damaging error rates more than I care about. But that's untested, and intuition is still a poor guide . From ducky at webfoot.com Mon Feb 17 19:58:42 2003 From: ducky at webfoot.com (Kaitlin Duck Sherwood) Date: Mon Feb 17 23:15:48 2003 Subject: [Spambayes] Are they learning? In-Reply-To: <3E4F6603.3090901@hooft.net> References: <3E4F6603.3090901@hooft.net> Message-ID: At 11:20 AM +0100 2/16/03, Rob Hooft wrote: > Just received this via a python.org mailinglist; spam is evolving > strongly to avoid automatic detection by bayesian techniques. If the spammers ever get too clever for a purely word-based approach, then it would be easy to toss in the ratio of non-letter characters (perl /W) : letter characters (perl /w) and/or characters inside HTML tags : characters outside HTML and/or number of spaces : total length of message as features. I believe that those ratios will do a good job of spotting messages that have wildly different "eye space" and "ASCII space" presentations. -- Kaitlin Duck Sherwood Author of the _Overcome Email Overload_ series, http://www.EmailOverload.com Help free our mailboxes. Include http://wecanstopspam.org in your signature. From cc at belfordhk.com Tue Feb 18 15:49:28 2003 From: cc at belfordhk.com (cc) Date: Tue Feb 18 02:48:44 2003 Subject: [Spambayes] unknown error Message-ID: <3E51E588.1020100@belfordhk.com> Hi, First time posting here. I've been using 1.02a of Spambayes for the past week or two and find it quite a good software. I'm still training it. But today, when I checked for email, I got the following error: error: uncaptured python exception, closing channel <__main__.ServerLineReader connected at 0x1011aa0> (exceptions.EOFError: [D:\PYTHON22\lib\asyncore.py|poll|99] [D:\PYTHON22\lib\asyncore.py|handle_read_event|396] [D:\PYTHON22\lib\asynchat.py|handle_read|130] [D:\spambayes\pop3proxy.py|found_terminator|199] [D:\spambayes\pop3proxy.py|onServerLine|267] [D:\spambayes\pop3proxy.py|onResponse|341] [D:\spambayes\pop3proxy.py|onTransaction|437] [D:\spambayes\pop3proxy.py|onTop|522] [D:\spambayes\pop3proxy.py|onRetr|484] [D:\spambayes\spambayes\classifier.py|chi2_spamprob|217] [D:\spambayes\spambayes\classifier.py|_getclues|437] [D:\spambayes\spambayes\storage.py|_wordinfoget|192] [D:\PYTHON22\lib\shelve.py|get|66] [D:\PYTHON22\lib\shelve.py|__getitem__|71]) The mail client (Mozilla 1.3a) just connects to the email server, but not actually get the email or even process it. I do have a log of the sessions: +OK POP3 Welcome to GNU POP3 Server Version 0.9.8 <5053.1045554404@asphalt> USER cc +OK PASS ******** +OK opened mailbox for cc STAT +OK 1 667 LIST +OK 1 667 . TOP 1 1 Then nothing happens. I haven't changed anything. Just got in today and boom, I get nothing. Any help appreciated. Thanks. -- email: cc@belfordhk.com | "A man who knows not where he goes, | knows not when he arrives." | - Anon From tim at fourstonesExpressions.com Tue Feb 18 07:36:05 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Tue Feb 18 08:36:13 2003 Subject: [Spambayes] Outlook Express configuration Message-ID: Does anyone know if there's an api, or some way to programmatically configure OE? I'm writing an installer for the pop3proxy, which needs to simply set the pop3 server address and port. I can't even find a place in the registry where this stuff is stored! Suggestions? c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org From N7DR at arrisi.com Tue Feb 18 08:12:20 2003 From: N7DR at arrisi.com (D. R. Evans) Date: Tue Feb 18 10:12:26 2003 Subject: [Spambayes] training WAS: aging information In-Reply-To: References: <3E509DBA.30411.320099E2@localhost> Message-ID: <3E51EAE4.3011.371604E4@localhost> On 17 Feb 2003 at 22:39, Tim Peters wrote: > > In real life, I'm not finding significant database growth over time > simply because I do little training anymore. If my database size were a So this raises a question I've had for a few days, concerning the internals of spambayes. I run in pop3proxy mode. The web page in that mode says that spambayes stores all my incoming mail. Presumably this means "we store it until you train on it" rather than "we store it for all time". I hope. In any case, I'm trying to figure out whether it's possible to save myself the increasingly-annoying chore of going to the web interface and training spambayes at least once per day. Each time I do that, I have to wade through a sea of subject lines, trying to figure out which ones might have been misclassified. This is going to get old real fast (actually, it probably takes considerably more of my time than deleting the spam would have done). I'm obviously missing something very simple about how this is supposed to be used, I guess. Doc -------------------------------------------------------------- Phone: +1 303 494 0394 Mobile: +1 720 839 8462 Fax: +1 781 240 0527 -------------------------------------------------------------- From francois.granger at free.fr Tue Feb 18 17:37:51 2003 From: francois.granger at free.fr (Fran=?ISO-8859-1?B?5w==?=ois Granger) Date: Tue Feb 18 11:37:57 2003 Subject: [Spambayes] training WAS: aging information In-Reply-To: <3E51EAE4.3011.371604E4@localhost> Message-ID: on 18/02/03 16:12, D. R. Evans at N7DR@arrisi.com wrote: > I run in pop3proxy mode. The web page in that mode says that spambayes > stores all my incoming mail. Presumably this means "we store it until > you train on it" rather than "we store it for all time". I hope. As far as I remember, it keeps the last 7 days.... > In any case, I'm trying to figure out whether it's possible to save > myself the increasingly-annoying chore of going to the web interface > and training spambayes at least once per day. Each time I do that, I > have to wade through a sea of subject lines, trying to figure out which > ones might have been misclassified. I usually click on the discard link in the head of the ham part. Then I look only at the spams and the unsure to check their classification and train on them. This is really quick. -- Le courrier est un moyen de communication. Les gens devraient se poser des questions sur les implications politiques des choix (ou non choix) de leurs outils et technologies. Pour des courriers propres : -- From burke at amieast.com Tue Feb 18 13:24:41 2003 From: burke at amieast.com (J. Burke Murray) Date: Tue Feb 18 13:25:14 2003 Subject: [Spambayes] Outlook 2000 add-in Message-ID: Hi, this is probably covered somewhere else but I looked through the archives and I can't find it. I can't get the addin to work in outlook 2000. Here is what I did: 1. installed python 2.2.2 2. installed win32all-150 3. executed the file addin.py But when I run outlook, I don't see any of the anti-spam stuff. I tried manually loading the add-in but I can't find the file to load (through tools -> options -> other -> Advanced Options -> COM Addins). Have I missed a step? I don't know diddly about outlook. I am running Windows NT 4.0 SP6. Thanks From tim at fourstonesExpressions.com Tue Feb 18 12:48:07 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Tue Feb 18 14:16:40 2003 Subject: [Spambayes] training WAS: aging information In-Reply-To: Message-ID: <7POHE7LFID5ZXVNLOI21HEQPWQMH95.3e527fe7@myst> 2/18/2003 10:37:51 AM, François Granger wrote: >on 18/02/03 16:12, D. R. Evans at N7DR@arrisi.com wrote: > >> I run in pop3proxy mode. The web page in that mode says that spambayes >> stores all my incoming mail. Presumably this means "we store it until >> you train on it" rather than "we store it for all time". I hope. > >As far as I remember, it keeps the last 7 days.... This is true. If you pay no attention, stuff goes away after 7 days. > >> In any case, I'm trying to figure out whether it's possible to save >> myself the increasingly-annoying chore of going to the web interface An idea that we toyed with, and even made a prototype implementation, was to include an smtpproxy in the mix. With that, you could train by forwarding a mail to spam@ or ham@. This was very convenient, and eliminated much of the 'increasingly-annoying chore' you refer to (which incidentally is part-and- parcel of bayesian (machine learning) algorithms). The problem with using an smtpproxy is that most mailers mess around with the headers. Some of them even lop almost all of them off. There are many important clues in the headers, and these clues are simply missed by this mechanism. So we chose to cache incoming mail and give a user interface, so training could be done on the intact mail. But you bring up an interesting point, in that it's very possible that having to train will be viewed as an annoying chore by many people. The smtpproxy might provide a much more convenient training mechanism. We've also toyed with the idea of providing pretrained databases, so people don't have to start training from scratch. Of course, the problem with this idea is that one man's spam is another man's subscription. I feel, though, that we *could* come up with a few trained databases that would fit some reasonable definitions, like "no hardcore porn" for example. For some people, Victoria's Secret would be included in that definition, for others it wouldn't. But almost everyone agrees on the definition of hardcore porn at some level, and we may very well be able to provide such a database. So, Doc, can you give us some feedback on these two ideas? - TimS >> and training spambayes at least once per day. Each time I do that, I >> have to wade through a sea of subject lines, trying to figure out which >> ones might have been misclassified. > >I usually click on the discard link in the head of the ham part. Then I look >only at the spams and the unsure to check their classification and train on >them. This is really quick. This is what I do as well, except that when I get a fn in ham, I immediately go to the pop3proxy ui and train that one as spam. I will stop even doing this when I'm satisfied with my fp/fn rate, and will then only train on mistakes, and occasionally on correctly classified stuff to be sure things don't get out of whack. - TimS > >-- >Le courrier est un moyen de communication. Les gens devraient >se poser des questions sur les implications politiques des choix (ou non >choix) de leurs outils et technologies. Pour des courriers propres : > -- > > >_______________________________________________ >Spambayes mailing list >Spambayes@python.org >http://mail.python.org/mailman/listinfo/spambayes > > c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org From mhammond at skippinet.com.au Wed Feb 19 08:37:36 2003 From: mhammond at skippinet.com.au (Mark Hammond) Date: Tue Feb 18 16:38:39 2003 Subject: [Spambayes] Outlook 2000 add-in In-Reply-To: Message-ID: <00a301c2d795$f5a47550$530f8490@eden> Can you please try the following: * Start pythonwin, and select "Tools->Remote Collector Debugging Tool" * Re-execute addin.py to ensure registration. * Start outlook You should see some messages, and hopefully a Python traceback, in the Pythonwin window. Please mail them to me/us. Thanks, Mark. > -----Original Message----- > From: spambayes-bounces@python.org > [mailto:spambayes-bounces@python.org]On Behalf Of J. Burke Murray > Sent: Wednesday, 19 February 2003 5:25 AM > To: spambayes@python.org > Subject: [Spambayes] Outlook 2000 add-in > > > Hi, this is probably covered somewhere else but I looked through the > archives and I can't find it. I can't get the addin to work > in outlook > 2000. Here is what I did: > > 1. installed python 2.2.2 > 2. installed win32all-150 > 3. executed the file addin.py > > But when I run outlook, I don't see any of the anti-spam > stuff. I tried > manually loading the add-in but I can't find the file to load (through > tools -> options -> other -> Advanced Options -> COM Addins). > > Have I missed a step? I don't know diddly about outlook. I > am running > Windows NT 4.0 SP6. > > Thanks > > > _______________________________________________ > Spambayes mailing list > Spambayes@python.org > http://mail.python.org/mailman/listinfo/spambayes From neale at woozle.org Tue Feb 18 14:02:18 2003 From: neale at woozle.org (Neale Pickett) Date: Tue Feb 18 17:02:47 2003 Subject: [Spambayes] training WAS: aging information In-Reply-To: <7POHE7LFID5ZXVNLOI21HEQPWQMH95.3e527fe7@myst> (Tim Stone - Four Stones Expressions's message of "Tue, 18 Feb 2003 12:48:07 -0600") References: <7POHE7LFID5ZXVNLOI21HEQPWQMH95.3e527fe7@myst> Message-ID: Tim Stone - Four Stones Expressions writes: > But you bring up an interesting point, in that it's very possible that having > to train will be viewed as an annoying chore by many people. IMHO, anything more than a "delete as spam" button is going to be too much for most people. Mail administrators would likely be willing to put up with more procedure, so maybe a forwarding mechanism like you describe would work if there is a central adminstrator. I suspect that providing a "spam" folder that people could drag false-positives into wouldn't be too much to ask of them. If you can get them to do that, a site administrator can set up mboxtrain to run against their mailboxes nightly and retrain the database. I would like to set something like this up on woozle (my home box) but I don't have time yet. But I think, in general, any training procedure that requires more than a single click, or a click and drag, is going to be seen as just as annoying as the spam itself. Neale From tim at fourstonesExpressions.com Tue Feb 18 16:13:01 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Tue Feb 18 17:13:09 2003 Subject: [Spambayes] training WAS: aging information In-Reply-To: Message-ID: 2/18/2003 4:02:18 PM, Neale Pickett wrote: >Tim Stone - Four Stones Expressions writes: > >> But you bring up an interesting point, in that it's very possible that having >> to train will be viewed as an annoying chore by many people. > >IMHO, anything more than a "delete as spam" button is going to be too >much for most people. Mail administrators would likely be willing to >put up with more procedure, so maybe a forwarding mechanism like you >describe would work if there is a central adminstrator. > >I suspect that providing a "spam" folder that people could drag >false-positives into wouldn't be too much to ask of them. If you can >get them to do that, a site administrator can set up mboxtrain to run >against their mailboxes nightly and retrain the database. I would like >to set something like this up on woozle (my home box) but I don't have >time yet. > >But I think, in general, any training procedure that requires more than >a single click, or a click and drag, is going to be seen as just as >annoying as the spam itself. This will be a challenge for wind'ohs stuff in general (except OL) But hey, we're bright people, so we should be able to figure *something* out - TimS > >Neale > >_______________________________________________ >Spambayes mailing list >Spambayes@python.org >http://mail.python.org/mailman/listinfo/spambayes > > c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org From david at theresistance.net Tue Feb 18 17:44:16 2003 From: david at theresistance.net (David Shaw) Date: Tue Feb 18 17:46:53 2003 Subject: [Spambayes] found a bug In-Reply-To: Message-ID: <82A5D7C8-4392-11D7-B4D3-000393582EF6@theresistance.net> Hi all, A friend of mine had a cache file in his "unknown" folder that caused the "review" web page in pop3proxy.py to generate the following traceback: Traceback (most recent call last): File "spambayes/Dibbler.py", line 398, in found_terminator getattr(plugin, name)(**params) File "pop3proxy.py", line 929, in onReview judgement = judgement.split(';')[0].strip() File "pop3proxy.py", line 815, in _makeMessageInfo print type(text) AttributeError: 'list' object has no attribute 'replace' He sent me the offending message (it's actually a digest of messages), and I replicated the problem: msg = open("/Users/dshaw/Desktop/crash_spam.txt", "r") message = mbox.get_message(msg) part = typed_subpart_iterator(message, 'text', 'plain').next() text = part.get_payload() >>> text [] So, instead of text, the payload is a list containing a single email.Message.Message instance. Here are the objects' respective payloads: >>> message._payload [, , , , , , , , , , , , , ] >>> text[0]._payload 'pilot-pda Digest 18 Feb 2003 11:00:01 -0000 Issue 1605\r\n\r\nTopics (messages 37277 through 37289):\r\n\r\n(best) Download sites survey\r\n\t37277 by: "Bill Shadish" \r\n\r\nMP3 player\r\n\t37278 by: Hassan Ajami \r\n\t37279 by: PocketGoddess \r\n\t37281 by: "Eric Fehrman" \r\n\t37289 by: "Roland J. Roberts" \r\n\r\nNew Tungsten W\r\n\t37280 by: PocketGoddess \r\n\r\nKyocera 7135\r\n\t37282 by: "MMS" \r\n\t37283 by: PocketGoddess \r\n\r\nPalmGear Debuts StreamLyncT AutoInstall Utility, Simplifying Download/Install Process For Palm OS\xae Software\r\n\t37284 by: "Kenny West \\(PalmGear.com\\)" \r\n\r\nClie SJ33 Review at Memoware.\r\n\t37285 by: "Kenneth S. Rhee" \r\n\r\nPalm Tungsten Digitizer Patch\r\n\t37286 by: "Michael R Kizer" \r\n\t37287 by: Chris Erickson \r\n\r\nPalmGear Debuts StreamLyncT AutoInstall Utility, Simplifying Download/Install Process For Palm OSR Software\r\n\t37288 by: "Don Ferguson" \r\n\r\nAdministrivia:\r\n\r\nTo subscribe to the digest, e-mail:\r\n\tpilot-pda-digest- subscribe@freeside.ultraviolet.org\r\n\r\nTo unsubscribe from the digest, e-mail:\r\n\tpilot-pda-digest- unsubscribe@freeside.ultraviolet.org\r\n\r\nTo post to the list, e-mail:\r\n\tpilot- pda@freeside.ultraviolet.org\r\n\r\n\r\n-------------------------------- --------------------------------------\r\n' The offending email is gzipped and attached. -------------- next part -------------- A non-text attachment was scrubbed... Name: 1045572229.gz Type: application/x-gzip Size: 6821 bytes Desc: not available Url : http://mail.python.org/pipermail/spambayes/attachments/20030218/fbf85deb/1045572229-0001.bin From T.A.Meyer at massey.ac.nz Wed Feb 19 12:34:35 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Tue Feb 18 18:35:12 2003 Subject: [Spambayes] training Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1318CD5F@its-xchg4.massey.ac.nz> > But I think, in general, any training procedure that requires > more than a single click, or a click and drag, is going to be > seen as just as annoying as the spam itself. This presumably means that spambayes has to either: (a) work via plugins (like Outlook) (b) hook onto existing capabilities of mail programs (like the smtpproxy) (a) would probably be a nicer solution, except that lots and lots of mail clients exist, and creating plugins for them all would be a lot of work - especially for 'dumb' clients like Outlook Express. Mac applications would probably be reasonably straightforward since Applescript could do a lot of the work. (b) has the problems already mentioned about mail programs mangling mail. (Plus it would be a single click and then typing an address, and then another click). What about the solution posted here a while back suggesting that the mail to spam@... contained a message key (generated by spambayes to be safe), and that key is used to find the message in the cache. If I understand things correctly, the proxy keeps mail for 7 days, right? So, if, during this time, a command was received to find a specific message in that mail, this could be done, couldn't it? I think the idea of providing various kick-start collections is probably a good one too, as long as there are disclaimers all over the place pointing out that they might not exactly match *your* spam. So, who wants to trawl through all the spam collections and sort them into types? :) =Tony Meyer From T.A.Meyer at massey.ac.nz Wed Feb 19 12:43:49 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Tue Feb 18 18:44:24 2003 Subject: [Spambayes] Habeas Headers Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1318D4E5@its-xchg4.massey.ac.nz> Hi all, Does anyone else here use the Habeas headers? Since I do, I patched my spambayes to add tokens for the habeas headers. If anyone else wants to, the code is attached. I'm not suggesting that this becomes a permanent modification to the spambayes code - given that Habeas is relatively new, it's unlikely to be in a lot of ham (unless your ham is from particular places) yet, and so would not make all that much difference to the FN/FP rates, and I agree that the code shouldn't bloat with unnecessary additions. It does help me a little (correctly moving mail from possible-spam to ham). In the event that Habeas catches on, however, it might be useful :) Cheers, Tony Meyer -------------- next part -------------- [Add the following code to tokenizer.py, pretty much anywhere. I have it just before subject] # Habeas Headers - see http://www.habeas.com if options.search_for_habeas_headers: habeas_headers = [ ("X-Habeas-SWE-1", "winter into spring"), ("X-Habeas-SWE-2", "brightly anticipated"), ("X-Habeas-SWE-3", "like Habeas SWE (tm)"), ("X-Habeas-SWE-4", "Copyright 2002 Habeas (tm)"), ("X-Habeas-SWE-5", "Sender Warranted Email (SWE) (tm). The sender of this"), ("X-Habeas-SWE-6", "email in exchange for a license for this Habeas"), ("X-Habeas-SWE-7", "warrant mark warrants that this is a Habeas Compliant"), ("X-Habeas-SWE-8", "Message (HCM) and not spam. Please report use of this"), ("X-Habeas-SWE-9", "mark in spam to .") ] valid_habeas = 0 invalid_habeas = False for opt, val in habeas_headers: habeas = msg.get(opt) if habeas is not None: if options.reduce_habeas_headers and habeas == val: valid_habeas = valid_habeas + 1 elif options.reduce_habeas_headers and habeas != val: invalid_habeas = True elif (not options.reduce_habeas_headers) and habeas == val: yield opt.lower() + ":valid" else: yield opt.lower() + ":invalid" if options.reduce_habeas_headers: # if there was any invalid line, we record as invalid # if all nine lines were correct, we record as valid # otherwise we ignore if invalid_habeas == True: yield "x-habeas-swe:invalid" elif valid_habeas == 9: yield "x-habeas-swe:valid" [Add the following code to Options.py, anywhere in the [Tokenizer] section] # If true, search for the habeas headers (see http://www.habeas.com) # If they are present and correct, this is a strong ham sign, if they are # present and incorrect, this is a strong spam sign search_for_habeas_headers: False # If search_for_habeas_headers is set, nine tokens are generated for # messages with habeas headers. This should be fine, since messages with # the headers should either be ham, or result in FN so that we can send # them to habeas so they can be sued. However, to reduce the strength # of habeas headers, we offer the ability to reduce the nine tokens to one. # (this option has no effect if search_for_habeas_headers is False) reduce_habeas_headers: False [Add the following code to Options.py, in the Tokenizer section of all_options] 'search_for_habeas_headers': boolean_cracker, 'reduce_habeas_headers': boolean_cracker, From jh at web.de Wed Feb 19 01:30:27 2003 From: jh at web.de (Juergen Hermann) Date: Tue Feb 18 19:30:55 2003 Subject: [Spambayes] training WAS: aging information In-Reply-To: <7POHE7LFID5ZXVNLOI21HEQPWQMH95.3e527fe7@myst> Message-ID: On Tue, 18 Feb 2003 12:48:07 -0600, Tim Stone - Four Stones Expressions wrote: >An idea that we toyed with, and even made a prototype implementation, was to >include an smtpproxy in the mix. With that, you could train by forwarding a >mail to spam@ or ham@. This was very convenient, and eliminated much of the >'increasingly-annoying chore' you refer to (which incidentally is part-and- >parcel of bayesian (machine learning) algorithms). The problem with using an >smtpproxy is that most mailers mess around with the headers. Well, for those mailers that don't, it would be a much nicer way than the "Review" page, especially after initial training. My client has a "Bounce" option that, unlike "Forward", basically just adds one more Via header and leaves the rest of the msg intact. Ciao, J?rgen From tony-bayes at lownds.com Tue Feb 18 17:40:23 2003 From: tony-bayes at lownds.com (Tony Lownds) Date: Tue Feb 18 20:40:24 2003 Subject: [Spambayes] To line munging? Message-ID: Hi, I restarted my proxy today and now the To: line in my emails is being changed. To: ham,"spambayes@python.org" ^^^^ Where is the code that is adding this? Any hints appreciated. -Tony From tim at fourstonesExpressions.com Tue Feb 18 19:43:03 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Tue Feb 18 20:43:11 2003 Subject: [Spambayes] To line munging? In-Reply-To: Message-ID: <06M972Y1T4293NM54B998MI1UDC4WWQ.3e52e127@myst> Bring up the options config page, and change 'Notate to' to False. Then restart the proxy. Somehow there's some double-negative logic somewhere, cause it should be defaulting to False. This is an option for mailers that can't filter on arbitrary headers... - TimS 2/18/2003 7:40:23 PM, Tony Lownds wrote: >Hi, > >I restarted my proxy today and now the To: line in my emails is being changed. > >To: ham,"spambayes@python.org" > ^^^^ > >Where is the code that is adding this? Any hints appreciated. > >-Tony > >_______________________________________________ >Spambayes mailing list >Spambayes@python.org >http://mail.python.org/mailman/listinfo/spambayes > > c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org From T.A.Meyer at massey.ac.nz Wed Feb 19 14:48:34 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Tue Feb 18 20:49:12 2003 Subject: [Spambayes] To line munging? Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1318D4E9@its-xchg4.massey.ac.nz> > Bring up the options config page, and change 'Notate to' to > False. Then restart the proxy. Somehow there's some > double-negative logic somewhere, cause it should be > defaulting to False. Well, if by double-negative, you actually mean single-positive, then yes there is ;) The current CVS of Options.py has pop3proxy_notate_to set to True. =Tony Meyer From N7DR at arrisi.com Tue Feb 18 18:56:38 2003 From: N7DR at arrisi.com (D. R. Evans) Date: Tue Feb 18 20:56:50 2003 Subject: [Spambayes] training WAS: aging information In-Reply-To: <7POHE7LFID5ZXVNLOI21HEQPWQMH95.3e527fe7@myst> References: Message-ID: <3E5281E6.24287.3963E01E@localhost> On 18 Feb 2003 at 12:48, Tim Stone - Four Stones Expressions wrote: > >As far as I remember, it keeps the last 7 days.... > > This is true. If you pay no attention, stuff goes away after 7 days. > That's definitely worth knowing. Thanks. > > > >> In any case, I'm trying to figure out whether it's possible to save > >> myself the increasingly-annoying chore of going to the web interface > > An idea that we toyed with, and even made a prototype implementation, > was to include an smtpproxy in the mix. With that, you could train by > forwarding a mail to spam@ or ham@. This was very convenient, and > eliminated much of the 'increasingly-annoying chore' you refer to (which > incidentally is part-and- parcel of bayesian (machine learning) > algorithms). The problem with using an smtpproxy is that most mailers > mess around with the headers. Some of them even lop almost all of them > off. There are many important clues in the headers, and these clues are > simply missed by this mechanism. So we chose to cache incoming mail and > give a user interface, so training could be done on the intact mail. > > But you bring up an interesting point, in that it's very possible that > having to train will be viewed as an annoying chore by many people. The What was really concerning me was that I had seen no indication that it was permissible simply to stop training -- and that if I did so, the system wouldn't just store incoming e-mails forever. So the first stop-gap solution is simple: somewhere state clearly that once the filter is working to a user's satisfaction, the user can stop training. Then the problem will be what to do when a spam gets through (or a ham doesn't). Obviously (if anything is truly obvious) the user will want to train on that one particular mail. The current interface would make this a nightmare -- There am I sitting with 7 days worth of e-mail (which in my case would be something like 1500 messages) and I want to find the one that has been misclassified. So it seems to me that there has to be something like the smtpproxy thing. But then I'm biased: my MUA doesn't delete headers. (Actually, I was unaware that any mailers did that sort of thing; but I readily admit that I'm a na?ve rustic.) > smtpproxy might provide a much more convenient training mechanism. > We've also toyed with the idea of providing pretrained databases, so > people don't have to start training from scratch. Of course, the I don't really like that idea very much. I'm trying to come up with a logical explanation for that feeling, though, and not doing very well. This is the best I can do: I am impressed at how quickly spambayes has moved toward near 100% accuracy on my system. (So far today it has classified a single spam as unsure; everything else has been classified correctly.) If I had started from a pre-seeded database, it isn't at all clear that it could have converged to my idea of spam as quickly as starting from an empty database. Obviously, the experiment could be done to see if it really is worth it, but I suspect that all of us have better things to do than to grab a ton of spam and build some filters. Maybe I'm wrong. I frequently am :-) > stop even doing this when I'm satisfied with my fp/fn rate, and will > then only train on mistakes, and occasionally on correctly classified > stuff to be sure things don't get out of whack. - TimS > I saw a comment in the LJ article that one should train on roughly equal numbers of spam and ham. Is this actually true? (This question of course merely demonstrates that I'm too lazy to do the maths myself.) One thing I've learned by doing the training is that approximately 10% of my mail is spam. I'm surprised, because I would have guessed that the proportion was lower than that. I guess that I had got to the point where I mentally just filtered it out of consciousness as I clicked the "delete" button every morning on the night's accumulation of the stuff. I really am going to have to try to find time to do the aging thing, though. I want to experiment with classifying off-thread postings to reflectors as spam :-) I suspect that it won't work very well, but the experiment seems like it's worth a try. Doc -------------------------------------------------------------- Phone: +1 303 494 0394 Mobile: +1 720 839 8462 Fax: +1 781 240 0527 -------------------------------------------------------------- From tim at fourstonesExpressions.com Tue Feb 18 19:59:21 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Tue Feb 18 20:59:31 2003 Subject: [Spambayes] To line munging? In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F1318D4E9@its-xchg4.massey.ac.nz> Message-ID: 2/18/2003 7:48:34 PM, "Meyer, Tony" wrote: >> Bring up the options config page, and change 'Notate to' to >> False. Then restart the proxy. Somehow there's some >> double-negative logic somewhere, cause it should be >> defaulting to False. > >Well, if by double-negative, you actually mean single-positive, then yes there is ;) Gosh, what idiot checked it in that way? - TimS > >The current CVS of Options.py has pop3proxy_notate_to set to True. > >=Tony Meyer > > c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org From tim at fourstonesExpressions.com Tue Feb 18 20:31:53 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Tue Feb 18 21:32:02 2003 Subject: [Spambayes] training WAS: aging information In-Reply-To: <3E5281E6.24287.3963E01E@localhost> Message-ID: 2/18/2003 7:56:38 PM, "D. R. Evans" wrote: >On 18 Feb 2003 at 12:48, Tim Stone - Four Stones Expressions wrote: > >> >As far as I remember, it keeps the last 7 days.... >> >> This is true. If you pay no attention, stuff goes away after 7 days. >> > >That's definitely worth knowing. Thanks. > >> > >> >> In any case, I'm trying to figure out whether it's possible to save >> >> myself the increasingly-annoying chore of going to the web interface >> >> An idea that we toyed with, and even made a prototype implementation, >> was to include an smtpproxy in the mix. With that, you could train by >> forwarding a mail to spam@ or ham@. This was very convenient, and >> eliminated much of the 'increasingly-annoying chore' you refer to (which >> incidentally is part-and- parcel of bayesian (machine learning) >> algorithms). The problem with using an smtpproxy is that most mailers >> mess around with the headers. Some of them even lop almost all of them >> off. There are many important clues in the headers, and these clues are >> simply missed by this mechanism. So we chose to cache incoming mail and >> give a user interface, so training could be done on the intact mail. >> >> But you bring up an interesting point, in that it's very possible that >> having to train will be viewed as an annoying chore by many people. The > >What was really concerning me was that I had seen no indication that it >was permissible simply to stop training -- and that if I did so, the >system wouldn't just store incoming e-mails forever. > >So the first stop-gap solution is simple: somewhere state clearly that >once the filter is working to a user's satisfaction, the user can stop >training. > >Then the problem will be what to do when a spam gets through (or a ham >doesn't). Obviously (if anything is truly obvious) the user will want >to train on that one particular mail. The current interface would make >this a nightmare -- There am I sitting with 7 days worth of e-mail >(which in my case would be something like 1500 messages) and I want to >find the one that has been misclassified. Very good point. I hear ya, and I'll start trying to figure out a way to accomplish this... it'll take a while, though, cause our pop3proxy guy, Richie, is out of circulation for a while... > >So it seems to me that there has to be something like the smtpproxy >thing. But then I'm biased: my MUA doesn't delete headers. (Actually, I >was unaware that any mailers did that sort of thing; but I readily >admit that I'm a na?ve rustic.) > >> smtpproxy might provide a much more convenient training mechanism. >> We've also toyed with the idea of providing pretrained databases, so >> people don't have to start training from scratch. Of course, the > >I don't really like that idea very much. I'm trying to come up with a >logical explanation for that feeling, though, and not doing very well. >This is the best I can do: > >I am impressed at how quickly spambayes has moved toward near 100% >accuracy on my system. (So far today it has classified a single spam as >unsure; everything else has been classified correctly.) If I had >started from a pre-seeded database, it isn't at all clear that it could >have converged to my idea of spam as quickly as starting from an empty >database. Obviously, the experiment could be done to see if it really >is worth it, but I suspect that all of us have better things to do than >to grab a ton of spam and build some filters. Maybe I'm wrong. I >frequently am :-) The thing I'm concerned about is that we really have only tapped people who are very saavy, and that the system will ultimately still be too difficult to comprehend for the 'average joe' user, who stresses out when installing the latest release of solitaire. (I have much experience with this syndrome.) This is *definitely* the case with the current state of the system. The vast majority of people pretty much expect to run setup.exe and it just miraculously works. - TimS > >> stop even doing this when I'm satisfied with my fp/fn rate, and will >> then only train on mistakes, and occasionally on correctly classified >> stuff to be sure things don't get out of whack. - TimS >> > >I saw a comment in the LJ article that one should train on roughly >equal numbers of spam and ham. Is this actually true? (This question of >course merely demonstrates that I'm too lazy to do the maths myself.) You should shoot for a relative balance, but our research seems to indicate that the system isn't particularly sensitive to anything but extreme imbalance. Tim Peters can fill us in a bit more on this one, if he's watching. Tim? Tim? Where are you? - TimS > >One thing I've learned by doing the training is that approximately 10% >of my mail is spam. I'm surprised, because I would have guessed that >the proportion was lower than that. I guess that I had got to the point >where I mentally just filtered it out of consciousness as I clicked the >"delete" button every morning on the night's accumulation of the stuff. > >I really am going to have to try to find time to do the aging thing, >though. I want to experiment with classifying off-thread postings to >reflectors as spam :-) I suspect that it won't work very well, but the >experiment seems like it's worth a try. Please do! > > Doc >-------------------------------------------------------------- >Phone: +1 303 494 0394 >Mobile: +1 720 839 8462 >Fax: +1 781 240 0527 >-------------------------------------------------------------- > > >_______________________________________________ >Spambayes mailing list >Spambayes@python.org >http://mail.python.org/mailman/listinfo/spambayes > > c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org From mhammond at skippinet.com.au Wed Feb 19 15:21:53 2003 From: mhammond at skippinet.com.au (Mark Hammond) Date: Tue Feb 18 23:23:32 2003 Subject: [Spambayes] Outlook plugin file locations changing Message-ID: <000001c2d7ce$6ed78c90$530f8490@eden> For those who don't read the checkins list, I just added the following: --- Store our config files in the "correct" Windows directory, using the SHGetFolderPath function to locate it. If we can't locate this, or can't create our SpamBayes directory under this, we stick with the "application directory". Code also exists to migrate your existing databases to this directory. First time you run Outlook after this update, your .pck/.db files will be *moved* to the new directory. Thus, no re-training should be necessary. About ready to release a stand-alone SpamBayes Outlook Plugin binary :) --- If we ever get a platform.py, or decide that there should be a single database for all spambayes "products" (ie, pop3 and Outlook sharing the same db, for example), then I will be happy to migrate. Let me know if you have any problems. Mark. From T.A.Meyer at massey.ac.nz Wed Feb 19 18:01:42 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Wed Feb 19 00:02:18 2003 Subject: [Spambayes] training Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1318CD61@its-xchg4.massey.ac.nz> What about this as a method: 1. POP3proxy adds another header to mail: 'X-Spambayes-ID: XXX' - the id is the same id as in the corpus caches (i.e. at the moment, this is the receiver time and a 'uniquifier'). 2. For manual training, this ID can be entered into the web ui to allow reviewing of that message. (The existing manual system also stays in place). 3. Incorrect mail is forwarded to spambayes_spam@localhost or spambayes_ham@localhost. The SMTP proxy examines mail to either of those addresses (and stops it going further). It checks for: (a) an attached message (b) the words "X-Spambayes-ID:" in the body And extracts the correct id, and uses this to find the message in the corpus cache, and then does the appropriate training. I've done 1 & 2, which at least solves the problem of finding a message to train in a huge cache. The SMTP proxy was at least partially done, which just leaves the searching (not that difficult) and the training hooks. If a mail application failed to include the headers in either an attached message or in the body (say it strips them), then there could always be an option to include the id in the message body (as ugly and intrusive as that is). This would work with Outlook Express and Eudora at least (I don't have anything else to test). What do you think? If someone still has it, could they send me the SMTP proxy prototype code? =Tony Meyer From mhammond at skippinet.com.au Wed Feb 19 19:53:31 2003 From: mhammond at skippinet.com.au (Mark Hammond) Date: Wed Feb 19 03:54:30 2003 Subject: [Spambayes] training In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F1318CD61@its-xchg4.massey.ac.nz> Message-ID: <001c01c2d7f4$618d2fb0$530f8490@eden> [Tony] > What about this as a method: ... > What do you think? It all sounds good to me. I guess you will need to handle the odd case where, over time, a mail is forward to both addresses. Presumably you will need to untrain the previous instruction. Which, coincidently, leads us to what I have been advocating for some time . The core spambayes code should persist the word database as now, but also a basic "message database". If we can get these abstractions into Corpus.py (and probably removing the factories), then Outlook could reuse all this code. If this sounds OK, I've a further idea I will expand in email :) > which just leaves ... the training hooks. Which also interests me! Mark. -------------- next part -------------- A non-text attachment was scrubbed... Name: winmail.dat Type: application/ms-tnef Size: 1940 bytes Desc: not available Url : http://mail.python.org/pipermail/spambayes/attachments/20030219/fcb3066d/winmail.bin From T.A.Meyer at massey.ac.nz Wed Feb 19 22:05:44 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Wed Feb 19 04:06:22 2003 Subject: [Spambayes] training Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1318D4F1@its-xchg4.massey.ac.nz> > Which, coincidently, leads us to what I have been advocating > for some time . :) > The core spambayes code should persist > the word database as now, but also a basic "message > database". Do you mean one like pop3proxy's cache? i.e. one that expires messages over a certain age? > If we can get these abstractions into Corpus.py > (and probably removing the factories), then Outlook could > reuse all this code. That would be nice - and nicer still if anyone else decides to write a plugin for some other client. > If this sounds OK, I've a further idea I will expand in email :) Go on, expand ;) =Tony Meyer From Paul.Moore at atosorigin.com Wed Feb 19 09:43:57 2003 From: Paul.Moore at atosorigin.com (Moore, Paul) Date: Wed Feb 19 04:44:31 2003 Subject: [Spambayes] training WAS: aging information Message-ID: <16E1010E4581B049ABC51D4975CEDB880113D91A@UKDCX001.uk.int.atosorigin.com> From: D. R. Evans [mailto:N7DR@arrisi.com] > I saw a comment in the LJ article that one should train on roughly > equal numbers of spam and ham. Is this actually true? (This question of > course merely demonstrates that I'm too lazy to do the maths myself.) That's something I'd be interested in, too - particularly as the ham:spam ratio people get is utterly out of their control. I'm also too lazy - or possibly incompetent - to do the maths, but IIRC, there were some experiments done at one stage. A pointer to the relevant posts (or better still, a summary on the website) would be very useful. > One thing I've learned by doing the training is that approximately 10% > of my mail is spam. I'm surprised, because I would have guessed that > the proportion was lower than that. I guess that I had got to the point > where I mentally just filtered it out of consciousness as I clicked the > "delete" button every morning on the night's accumulation of the stuff. Unfortunately for me, my ham:spam ratio is something like 99% *spam*. This is because I run a highly filtered setup, with all my mailing list traffic getting taken out of the mail stream before spambayes gets a look in. So bad results from serious imbalances is a big problem for me. I can get round it by pre-training on my existing inbox, but the imbalance is going to be big one way or the other at the start. I *really* need spambayes, not to filter out the spam, but for the other side of the coin - to find the real mail in the mass of junk. I regularly consider switching to a new account, but never do because tracking down the places where my existing mail address is published "legitimately" is just too much like hard work :-( Paul. From whisper at oz.net Wed Feb 19 01:57:20 2003 From: whisper at oz.net (David LeBlanc) Date: Wed Feb 19 04:57:15 2003 Subject: [Spambayes] OUCH! Message-ID: I guess there's a first time for everything. In using 3 different versions of Outlook, this is the first time I've ever seen it pop up an "illegal operation - program terminating" dialog! OL died very fast after the "ok" button was pressed too. It took a LONG time to rebuild it's indices or whatever it is that Outlook does when it's restarted after an abrupt shutdown. Spammie came back though, so OL probably didn't think it was Spammie's fault (I guess). Alas, there where no discernable logs I could find to figure out what went *splat*. The only thing that has changed in a long time is the addition of spammie... David LeBlanc Seattle, WA USA From Paul.Moore at atosorigin.com Wed Feb 19 09:59:20 2003 From: Paul.Moore at atosorigin.com (Moore, Paul) Date: Wed Feb 19 04:59:52 2003 Subject: [Spambayes] training Message-ID: <16E1010E4581B049ABC51D4975CEDB886199EA@UKDCX001.uk.int.atosorigin.com> From: Meyer, Tony [mailto:T.A.Meyer@massey.ac.nz] > What about this as a method: [...] Looks good. > 3. Incorrect mail is forwarded to spambayes_spam@localhost > or spambayes_ham@localhost. I assume you'll make these addresses configurable? For those of us with enough control over local addresses (like me), forwarding to just "spam" or "ham" would be preferable... > The SMTP proxy examines mail to either of those addresses (and > stops it going further). It checks for: > (a) an attached message > (b) the words "X-Spambayes-ID:" in the body I assume that you mean either of these - some clients will attach the original in preference, others will append it as text. Of course, if you're aiming at the mass market end of things, you need to look out for mailers that will mash the whole thing into HTML, so you get the X-Spambayes-ID in HTML in the body. (Are there any mailers like this? I could easily imagine Outlook Express being this nasty...) And I once had a reply to a perfectly normal mail (using the Outlook web client) get sent as base64-encoded UTF-8 because the client had added a couple of garbage non-ASCII characters at the end, unknown to me :-( I'm not saying you should handle all cases of pathological behaviour, but you could do with being aware of the possibilities, just so the "it didn't work!" cries don't come as a surprise... I'm willing to set up a test machine with a variety of Windows mail systems on, (I can get OE, Pegasus, Agent, Gravity) and try the system out, but I don't have a lot of time, so I'll only be able to do fairly minimal tests... > What do you think? Sounds nice. Paul. From whisper at oz.net Wed Feb 19 02:14:14 2003 From: whisper at oz.net (David LeBlanc) Date: Wed Feb 19 05:14:13 2003 Subject: [Spambayes] A few code questions (Outlook oriented) Message-ID: I would like to fix the read status change problem. I would like to know the following: 1. Where in the code is the handler for the "delete as spam" button? 2. Where is the code for sending a spam to the spam folder (when it's determined to be spam upon fetching from the mail server). Now for the (probably) dumb question. I don't see how this works: #manager.py, line 159 # determine which db manager to use, and create it. ManagerClass = [PickleStorageManager, DBStorageManager][use_db] self.db_manager = ManagerClass(bayes_base, mdb_base) ManagerClass is two lists, one has pointers to the two classes and the other is the flag - and then it's called!?! eh? (I have no immediate plans to change this (if I ever do), but I would like to understand what's going on.) BTW, how does it get to reference a global (use_db) without a "global" statement in the __init__ scope? I can't see where use_db gets used either? Thanks to the kind soul(s) that help, David LeBlanc Seattle, WA USA From Paul.Moore at atosorigin.com Wed Feb 19 10:25:44 2003 From: Paul.Moore at atosorigin.com (Moore, Paul) Date: Wed Feb 19 05:26:17 2003 Subject: [Spambayes] A few code questions (Outlook oriented) Message-ID: <16E1010E4581B049ABC51D4975CEDB886199EB@UKDCX001.uk.int.atosorigin.com> From: David LeBlanc [mailto:whisper@oz.net] > Now for the (probably) dumb question. I don't see how this works: > > #manager.py, line 159 > # determine which db manager to use, and create it. > ManagerClass = [PickleStorageManager, DBStorageManager][use_db] > self.db_manager = ManagerClass(bayes_base, mdb_base) > > ManagerClass is two lists, one has pointers to the two classes and the other > is the flag - and then it's called!?! eh? (I have no immediate plans to > change this (if I ever do), but I would like to understand what's going on.) [PickleStorageManager, DBStorageManager] is a list. You then *index* that list via [use_db] (it's the multiple meanings of [...] that are confusing). use_db is a boolean, taking values 0 or 1. So, the code is equivalent to if use_db: ManagerClass = DBStorageManager else: ManagerClass = PickleStorageManager You then call ManagerClass (which is one of the two relevant classes) to construct the manager object. If you enjoyed this interlude, go and watch comp.lang.python, where there are currently over 1000 messages on various proposals for how to write C's conditional expression ( a ? b : c ) in Python. I'm sure this counts as some evidence in that debate, but it needs more evidence like a forest fire needs petrol... (PS Mark, maybe you could rewrite the statement as a 4-line if, like I did above, just for clarity?) > BTW, how does it get to reference a global (use_db) without a "global" > statement in the __init__ scope? I can't see where use_db gets used either? The "global" statement is only needed if you want to *update* the global. Read access to globals (when not shadowed by locals) is transparent. Paul. From mhammond at skippinet.com.au Wed Feb 19 21:37:17 2003 From: mhammond at skippinet.com.au (Mark Hammond) Date: Wed Feb 19 05:38:26 2003 Subject: [Spambayes] A few code questions (Outlook oriented) In-Reply-To: Message-ID: <000001c2d802$e09a2d90$530f8490@eden> [David] > 1. Where in the code is the handler for the "delete as spam" button? Pretty much all UI hooks are in addin.py - specifically, ButtonDeleteAsSpamEvent.OnClick() is where the action is. > 2. Where is the code for sending a spam to the spam folder (when it's > determined to be spam upon fetching from the mail server). Top-level ProcessMessage() in addin.py winds up calling filter.py. > Now for the (probably) dumb question. I don't see how this works: See Paul's reply, but for the sake of " value": > If you enjoyed this interlude, go and watch comp.lang.python, where > there are currently over 1000 messages on various proposals was certainly in my mind when I wrote that :) Paul again: > (PS Mark, maybe you could rewrite the statement as a 4-line if, like > I did above, just for clarity?) But then life would be boring, and these cute little interludes would no longer happen Mark. From mhammond at skippinet.com.au Wed Feb 19 21:48:36 2003 From: mhammond at skippinet.com.au (Mark Hammond) Date: Wed Feb 19 05:49:56 2003 Subject: [Spambayes] training In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F1318D4F1@its-xchg4.massey.ac.nz> Message-ID: <000301c2d804$74d7be40$530f8490@eden> > > Which, coincidently, leads us to what I have been advocating > > for some time . > > :) > > > The core spambayes code should persist > > the word database as now, but also a basic "message > > database". > > Do you mean one like pop3proxy's cache? i.e. one that > expires messages over a certain age? I actually just meant a simple msg_id->trained_as_spam dictionary - just a memory that a message had previously been trained as ham/spam, so a need to untrain and multiple requests for the same message can be detected. This is user-proof in the face of I-double-click-everywhere type users > > If this sounds OK, I've a further idea I will expand in email :) I meant to say "private email", but the list is quiet at the moment ... I was thinking that we could possibly abstract the database out one step more. Have a single "database manager" that maintains a few 'databases' - really just discrete tables, with no joins, in standard database parlance. What I'm trying to get at is that if we could have 2 dictionaries (existing word dictionary, plus one more "msg_id->how_was_trained") stored in a single file, and maybe even the possibility of additional "application defined" dictionaries (such as random config info) in that same file, life would be pretty peachy :) If we talk in terms of pickles, imagine: database['bayes'] = existing_bayes_pickle database['training'] = dict_I_proposed_above database['outlook_ui'] = dict_for_outlook_ui_options And 'database' is pickled. I see no reason this couldn't also work for bsdbd. I am proposing that Corpus.py automatically manage the 'bayes' and 'training' keys of the database, but leave others for applications. Bayes itself persists the entire database. Some naming convention would be just fine too :) Never-satisfied-ly, Mark. -------------- next part -------------- A non-text attachment was scrubbed... Name: winmail.dat Type: application/ms-tnef Size: 2652 bytes Desc: not available Url : http://mail.python.org/pipermail/spambayes/attachments/20030219/94d4f844/winmail.bin From whisper at oz.net Wed Feb 19 02:52:53 2003 From: whisper at oz.net (David LeBlanc) Date: Wed Feb 19 05:53:16 2003 Subject: [Spambayes] A few code questions (Outlook oriented) In-Reply-To: <16E1010E4581B049ABC51D4975CEDB886199EB@UKDCX001.uk.int.atosorigin.com> Message-ID: > -----Original Message----- > From: Moore, Paul [mailto:Paul.Moore@atosorigin.com] > Sent: Wednesday, February 19, 2003 2:26 > To: David LeBlanc; Spambayes@Python. Org > Subject: RE: [Spambayes] A few code questions (Outlook oriented) > > > From: David LeBlanc [mailto:whisper@oz.net] > > Now for the (probably) dumb question. I don't see how this works: > > > > #manager.py, line 159 > > # determine which db manager to use, and create it. > > ManagerClass = [PickleStorageManager, DBStorageManager][use_db] > > self.db_manager = ManagerClass(bayes_base, mdb_base) > > > > ManagerClass is two lists, one has pointers to the two classes > and the other > > is the flag - and then it's called!?! eh? (I have no immediate plans to > > change this (if I ever do), but I would like to understand > what's going on.) > > [PickleStorageManager, DBStorageManager] is a list. You then *index* that > list via [use_db] (it's the multiple meanings of [...] that are > confusing). Oh - duh. Never crossed my mind: I saw two lists. > use_db is a boolean, taking values 0 or 1. So, the code is equivalent to > > if use_db: > ManagerClass = DBStorageManager > else: > ManagerClass = PickleStorageManager > > You then call ManagerClass (which is one of the two relevant classes) to > construct the manager object. Yup yup. > If you enjoyed this interlude, go and watch comp.lang.python, > where there are > currently over 1000 messages on various proposals for how to write C's > conditional expression ( a ? b : c ) in Python. I'm sure this > counts as some > evidence in that debate, but it needs more evidence like a forest > fire needs > petrol... It's getting my -1 unless it's "a ? b : c" and since Guido hates "?"... ;) > (PS Mark, maybe you could rewrite the statement as a 4-line if, like I did > above, just for clarity?) What Mark said in his reply: dunt dumb down the code - comment it! > > BTW, how does it get to reference a global (use_db) without a "global" > > statement in the __init__ scope? I can't see where use_db gets > used either? > > The "global" statement is only needed if you want to *update* the > global. Read > access to globals (when not shadowed by locals) is transparent. Argh! really guys, I HAVE been programming Python for more than 2 weeks! I thought if it went one way, it went the other and I've always been after a global to update it, so that's why I never knew this. (Don't start with the global updating - it's generally an init thing only!) > Paul. Thanks for the help gents. Mark, I'll be back to you when I get further into the read status mod. Regards, Dave LeBlanc Seattle, WA USA From mhammond at skippinet.com.au Wed Feb 19 22:01:08 2003 From: mhammond at skippinet.com.au (Mark Hammond) Date: Wed Feb 19 06:02:22 2003 Subject: [Spambayes] training WAS: aging information In-Reply-To: <16E1010E4581B049ABC51D4975CEDB880113D91A@UKDCX001.uk.int.atosorigin.com> Message-ID: <000d01c2d806$34e93fa0$530f8490@eden> [Paul] > From: D. R. Evans [mailto:N7DR@arrisi.com] > > I saw a comment in the LJ article that one should train on roughly > > equal numbers of spam and ham. Is this actually true? (This > question of > > course merely demonstrates that I'm too lazy to do the > maths myself.) > > That's something I'd be interested in, too - particularly as > the ham:spam ratio people get is utterly out of their control. Yes, but the number we use to train on isn't. > I'm also too lazy - or possibly incompetent - to do the maths, I'm certainly the latter , but: > but IIRC, there were some > experiments done at one stage. A pointer to the relevant > posts (or better > still, a summary on the website) would be very useful. AFAIK, this experiment is still ongoing. Particularly, the Outlook default config file still has: --- # This will probably go away if testing confirms it's a Good Thing. experimental_ham_spam_imbalance_adjustment: True --- I guess it can safely be stated that testing has not proved it a bad thing, but that isn't what the comment asks > Unfortunately for me, my ham:spam ratio is something like 99% > *spam*. This > is because I run a highly filtered setup, with all my mailing > list traffic > getting taken out of the mail stream before spambayes gets a look in. I am approaching that. My problem is that I delete items from my Inbox, but never delete them from the Spam folder. This is mainly for training purposes, but I guess it could come in handy when I need to make money fast . However, the end result is that my spam:ham ratio is slowly growing. Human perception gets in the way though. It was not that many months ago that I considered 20 spam a day bearable (and from what I understand, a .au address means only 20 makes me lucky!). Now I find that for any "unsure" items that are found, I begin to wonder if SpamBayes is no longer doing its job. I believe the truth simpl is that SpamBayes has lowered my threshold to the point where whenever I see *any* spam I recoil. Mark. -------------- next part -------------- A non-text attachment was scrubbed... Name: winmail.dat Type: application/ms-tnef Size: 2888 bytes Desc: not available Url : http://mail.python.org/pipermail/spambayes/attachments/20030219/6189e500/winmail.bin From tim at fourstonesExpressions.com Wed Feb 19 07:11:15 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Wed Feb 19 08:11:27 2003 Subject: [Spambayes] training In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F1318CD61@its-xchg4.massey.ac.nz> Message-ID: 2/18/2003 11:01:42 PM, "Meyer, Tony" wrote: >What about this as a method: > >1. POP3proxy adds another header to mail: 'X-Spambayes-ID: XXX' - the id is the same id as in the corpus caches (i.e. at the moment, this is the receiver time and a 'uniquifier'). > >2. For manual training, this ID can be entered into the web ui to allow reviewing of that message. (The existing manual system also stays in place). > >3. Incorrect mail is forwarded to spambayes_spam@localhost or spambayes_ham@localhost. The SMTP proxy examines mail to either of those addresses (and stops it going further). It checks for: >(a) an attached message >(b) the words "X-Spambayes-ID:" in the body >And extracts the correct id, and uses this to find the message in the corpus cache, and then does the appropriate training. The problem here is that some mailers pretty much lose most of the headers when you do a forward operation... Placing something like a url in the body of a message is another possibility that's been raised. It's somewhat dangerous, particularly in the case of multipart messages, and for html messages may not be visible at all. SpamAssassin modifies the subject for exactly these reasons. It's the one header that can pretty much be guaranteed to be there when you need it and be testable with most any mailer filtering mechanism. But you can't put a url in it, and putting an id that some user has to cut and paste, while better than nothing, doesn't really make life much easier for J. Q. Public. - TimS > >I've done 1 & 2, which at least solves the problem of finding a message to train in a huge cache. The SMTP proxy was at least partially done, which just leaves the searching (not that difficult) and the training hooks. > >If a mail application failed to include the headers in either an attached message or in the body (say it strips them), then there could always be an option to include the id in the message body (as ugly and intrusive as that is). This would work with Outlook Express and Eudora at least (I don't have anything else to test). > >What do you think? > >If someone still has it, could they send me the SMTP proxy prototype code? Hmmmm.... good question, I should have it somewhere (I wrote it). But it's not integrated with pop3proxy, and so database updates from either clobber the other. It really needs to be a single process, and Richie was going to do that until we told him we didn't see any particular value to the work. If we cannot guarantee that the header we need will be there with all mailers, then we either have to change the mechanism, or begin to account for different mailers, which would be really awful (of course). > >=Tony Meyer > >_______________________________________________ >Spambayes mailing list >Spambayes@python.org >http://mail.python.org/mailman/listinfo/spambayes > > c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org From tim at fourstonesExpressions.com Wed Feb 19 07:14:06 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Wed Feb 19 08:14:16 2003 Subject: [Spambayes] training In-Reply-To: <16E1010E4581B049ABC51D4975CEDB886199EA@UKDCX001.uk.int.atosorigin.com> Message-ID: <5ZZWBDC1T63VTTQLKPJED97OJD9UQWT.3e53831e@myst> 2/19/2003 3:59:20 AM, "Moore, Paul" wrote: >From: Meyer, Tony [mailto:T.A.Meyer@massey.ac.nz] >> What about this as a method: >[...] > >Looks good. > >> 3. Incorrect mail is forwarded to spambayes_spam@localhost >> or spambayes_ham@localhost. > >I assume you'll make these addresses configurable? For those >of us with enough control over local addresses (like me), forwarding >to just "spam" or "ham" would be preferable... > >> The SMTP proxy examines mail to either of those addresses (and >> stops it going further). It checks for: >> (a) an attached message >> (b) the words "X-Spambayes-ID:" in the body > >I assume that you mean either of these - some clients will attach the >original in preference, others will append it as text. > >Of course, if you're aiming at the mass market end of things, you need >to look out for mailers that will mash the whole thing into HTML, so >you get the X-Spambayes-ID in HTML in the body. (Are there any mailers >like this? I could easily imagine Outlook Express being this nasty...) > >And I once had a reply to a perfectly normal mail (using the Outlook >web client) get sent as base64-encoded UTF-8 because the client had >added a couple of garbage non-ASCII characters at the end, unknown to >me :-( > >I'm not saying you should handle all cases of pathological behaviour, >but you could do with being aware of the possibilities, just so the >"it didn't work!" cries don't come as a surprise... > >I'm willing to set up a test machine with a variety of Windows mail >systems on, (I can get OE, Pegasus, Agent, Gravity) and try the system >out, but I don't have a lot of time, so I'll only be able to do fairly >minimal tests... That'd be really great. We should at least know what mailers our stuff has been tested on. I use the opera mailer. I know some people still use Eudora, Mozilla, and Netscape (lots of versions). I have Netscape 4.something and 6 installed somewhere. - TimS > >> What do you think? > >Sounds nice. > >Paul. > >_______________________________________________ >Spambayes mailing list >Spambayes@python.org >http://mail.python.org/mailman/listinfo/spambayes > > c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org From jbenoit at cybectec.com Wed Feb 19 08:21:48 2003 From: jbenoit at cybectec.com (Jacques Benoit) Date: Wed Feb 19 08:31:49 2003 Subject: [Spambayes] Problem report Message-ID: <6102B1CF3E99D411A11100E0296F6B8D5FBD32@CYBQC07> Hello, I first want to thank you for an excellent program. It really works. I am enclosing a program trace and an email that seems to cause a problem. Keep on the good work. Jacques Benoit ===== Deleting and spam training message 'Lose 22.5lbs in 3 weeks for FREE! ' - FAILED to create email.message from: 'Received: from maili41.mxdat.org (ms5.mxdat.com [209.236.58.41]) by cybqc07.cybectec.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2653.13)\r\n\tid 1B84HXQH; Wed, 19 Feb 2003 07:13:59 -0500\r\nTo: jbenoit@cybectec.com\r\nDate: Wed, 19 Feb 2003 07:19:15 -0500\r\nMessage-ID: <1045657155.5456@green5>\r\nX-Mailer: Pine.GSO.4.31\r\nFrom: "Get Serious" \r\nReturn-Path: \r\nReply-To: \r\nSubject: Lose 22.5lbs in 3 weeks for FREE!\r\n \r\n**********************************************************************\r \nPLEASE DO NOT REPLY TO THIS EMAIL - To unsubscribe, please see the\r\nsubscription management section at the bottom of this newsletter.\r\n************************************************************* *********\r\n\n\n\r\n"I Couldn\'t Face Another Holiday Being Called the \'FAT ONE\'... \r\nThank God I Found Apple Cider Vinegar Enhanced!" \r\n\r\nGet Your Free Bottle & SEE FOR YOURSELF! \r\n\r\nhttp://209.236.60.3/lc1/go.php?10\r\n\r\nCLICK HERE FOR DETAILS! \r\nNo Crash Diets! No Painful Excercise! \r\n\r\nhttp://209.236.60.3/lc1/go.php?10\r\n\r\n=========================== ========================== \r\n\r\n\r\nEnjoy your day,\r\n\r\n\r\nDaily Max Deal Chopper\r\n\r\n\r\n######################################################### #############\r\nIf you no longer wish to receive your edition of the Daily Max Deal Chop \r\nNewsletter, please follow the link below and follow the simple \r\nunsubscribe instructions.\r\n\r\nhttp://209.236.60.3/unsub.htm\r\n\r\nThe use and unauthorized reproduction of this message and delivery header \r\ninformation is strictly prohibited. This e-mail is meant for informational \r\npurposes only. JudoMonkey makes no guarantees in connection with the \r\nproduct(s) or service(s) presented.\r\n############################################################## ########\r\n\r\nworabvg^plorpgrp(pbz\r\n' pythoncom error: Python error invoking COM method. Traceback (most recent call last): File "D:\PROGRA~1\Python22\lib\site-packages\win32com\server\policy.py", line 275, in _Invoke_ return self._invoke_(dispid, lcid, wFlags, args) File "D:\PROGRA~1\Python22\lib\site-packages\win32com\server\policy.py", line 280, in _invoke_ return S_OK, -1, self._invokeex_(dispid, lcid, wFlags, args, None, None) File "D:\PROGRA~1\Python22\lib\site-packages\win32com\server\policy.py", line 510, in _invokeex_ return apply(func, args) File "D:\Program Files\Spambayes\spambayes-1.0a1\Outlook2000\addin.py", line 315, in OnClick if train.train_message(msgstore_message, True, self.manager, rescore = True): File "D:\Program Files\Spambayes\spambayes-1.0a1\Outlook2000\train.py", line 43, in train_message stream = msg.GetEmailPackageObject() File "D:\Program Files\Spambayes\spambayes-1.0a1\Outlook2000\msgstore.py", line 565, in GetEmailPackageObject msg = email.message_from_string(text) File "D:\PROGRA~1\Python22\Lib\email\__init__.py", line 52, in message_from_string return Parser(_class, strict=strict).parsestr(s) File "D:\PROGRA~1\Python22\Lib\email\Parser.py", line 75, in parsestr return self.parse(StringIO(text), headersonly=headersonly) File "D:\PROGRA~1\Python22\Lib\email\Parser.py", line 62, in parse self._parseheaders(root, fp) File "D:\PROGRA~1\Python22\Lib\email\Parser.py", line 128, in _parseheaders raise Errors.HeaderParseError( email.Errors.HeaderParseError: Not a header, not a continuation: ``**********************************************************************'' ===== <> -------------- next part -------------- An embedded message was scrubbed... From: Get Serious Subject: Lose 22.5lbs in 3 weeks for FREE! Date: Wed, 19 Feb 2003 07:19:15 -0500 Size: 1390 Url: http://mail.python.org/pipermail/spambayes/attachments/20030219/9dd06b8a/attachment-0001.eml From tim at fourstonesExpressions.com Wed Feb 19 08:03:50 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Wed Feb 19 09:04:02 2003 Subject: [Spambayes] Problem report In-Reply-To: <6102B1CF3E99D411A11100E0296F6B8D5FBD32@CYBQC07> Message-ID: <31PLE1KFIH7SQMJ72XVRP83RN982X.3e538ec6@myst> 2/19/2003 7:21:48 AM, Jacques Benoit wrote: > > > From: Jacques Benoit > > To: "'spambayes@python.org'" > Date: Wed, 19 Feb 2003 08:21:48 -0500 > Subject:[Spambayes] Problem report > > Hello, > > I first want to thank you for an excellent program. It really works. > I am enclosing a program trace and an email that seems to cause a problem. > Keep on the good work. Are you using the Outlook plugin, or the pop3proxy? - TimS > > Jacques Benoit > > ===== > Deleting and spam training message 'Lose 22.5lbs in 3 weeks for FREE! ' - > FAILED to create email.message from: 'Received: from maili41.mxdat.org > (ms5.mxdat.com [209.236.58.41]) by cybqc07.cybectec.com with SMTP (Microsoft > Exchange Internet Mail Service Version 5.5.2653.13)\r\n\tid 1B84HXQH; Wed, > 19 Feb 2003 07:13:59 -0500\r\nTo: jbenoit@cybectec.com\r\nDate: Wed, 19 Feb > 2003 07:19:15 -0500\r\nMessage-ID: <1045657155.5456@green5>\r\nX-Mailer: > Pine.GSO.4.31\r\nFrom: "Get Serious" \r\nReturn-Path: > \r\nReply-To: \r\nSubject: Lose > 22.5lbs in 3 weeks for FREE!\r\n > \r\n********************************************************************** \r > \nPLEASE DO NOT REPLY TO THIS EMAIL - To unsubscribe, please see > the\r\nsubscription management section at the bottom of this > newsletter.\r \n************************************************************* > *********\r\n\n\n\r\n"I Couldn\'t Face Another Holiday Being Called the > \'FAT ONE\'... \r\nThank God I Found Apple Cider Vinegar Enhanced!" > \r\n\r\nGet Your Free Bottle & SEE FOR YOURSELF! > \r\n\r\nhttp://209.236.60.3/lc1/go.php?10\r\n\r\nCLICK HERE FOR DETAILS! > \r\nNo Crash Diets! No Painful Excercise! > \r\n\r\nhttp://209.236.60.3/lc1/go.php?10\r\n\r \n=========================== > ========================== \r\n\r\n\r\nEnjoy your day,\r\n\r\n\r\nDaily Max > Deal > Chopper\r\n\r\n\r \n######################################################### > #############\r\nIf you no longer wish to receive your edition of the Daily > Max Deal Chop \r\nNewsletter, please follow the link below and follow the > simple \r\nunsubscribe > instructions.\r\n\r\nhttp://209.236.60.3/unsub.htm\r\n\r\nThe use and > unauthorized reproduction of this message and delivery header > \r\ninformation is strictly prohibited. This e-mail is meant for > informational \r\npurposes only. JudoMonkey makes no guarantees in > connection with the \r\nproduct(s) or service(s) > presented.\r \n############################################################## > ########\r\n\r\nworabvg^plorpgrp(pbz\r\n' > pythoncom error: Python error invoking COM method. > Traceback (most recent call last): > File "D:\PROGRA~1\Python22\lib\site-packages\win32com\server\policy.py", > line 275, in _Invoke_ > return self._invoke_(dispid, lcid, wFlags, args) > File "D:\PROGRA~1\Python22\lib\site-packages\win32com\server\policy.py", > line 280, in _invoke_ > return S_OK, -1, self._invokeex_(dispid, lcid, wFlags, args, None, None) > File "D:\PROGRA~1\Python22\lib\site-packages\win32com\server\policy.py", > line 510, in _invokeex_ > return apply(func, args) > File "D:\Program Files\Spambayes\spambayes-1.0a1\Outlook2000\addin.py", > line 315, in OnClick > if train.train_message(msgstore_message, True, self.manager, rescore = > True): > File "D:\Program Files\Spambayes\spambayes-1.0a1\Outlook2000\train.py", > line 43, in train_message > stream = msg.GetEmailPackageObject() > File "D:\Program Files\Spambayes\spambayes-1.0a1\Outlook2000 \msgstore.py", > line 565, in GetEmailPackageObject > msg = email.message_from_string(text) > File "D:\PROGRA~1\Python22\Lib\email\__init__.py", line 52, in > message_from_string > return Parser(_class, strict=strict).parsestr(s) > File "D:\PROGRA~1\Python22\Lib\email\Parser.py", line 75, in parsestr > return self.parse(StringIO(text), headersonly=headersonly) > File "D:\PROGRA~1\Python22\Lib\email\Parser.py", line 62, in parse > self._parseheaders(root, fp) > File "D:\PROGRA~1\Python22\Lib\email\Parser.py", line 128, in > _parseheaders > raise Errors.HeaderParseError( > email.Errors.HeaderParseError: Not a header, not a continuation: > ``**********************************************************************'' > ===== > > <> > > > From: Get Serious > > To: Jacques Benoit > Subject:Lose 22.5lbs in 3 weeks for FREE! > Date: Wed, 19 Feb 2003 07:19:15 -0500 > > > "I Couldn't Face Another Holiday Being Called the 'FAT ONE'... > Thank God I Found Apple Cider Vinegar Enhanced!" > > Get Your Free Bottle & SEE FOR YOURSELF! > > http://209.236.60.3/lc1/go.php?10 > > CLICK HERE FOR DETAILS! > No Crash Diets! No Painful Excercise! > > http://209.236.60.3/lc1/go.php?10 > > ===================================================== > > > Enjoy your day, > > > Daily Max Deal Chopper > > > ###################################################################### > If you no longer wish to receive your edition of the Daily Max Deal Chop > Newsletter, please follow the link below and follow the simple > unsubscribe instructions. > > http://209.236.60.3/unsub.htm > > The use and unauthorized reproduction of this message and delivery header > information is strictly prohibited. This e-mail is meant for informational > purposes only. JudoMonkey makes no guarantees in connection with the > product(s) or service(s) presented. > ###################################################################### > > worabvg^plorpgrp(pbz c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org From tim at fourstonesExpressions.com Wed Feb 19 08:16:34 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Wed Feb 19 09:16:43 2003 Subject: [Spambayes] Outlook plugin file locations changing In-Reply-To: <000001c2d7ce$6ed78c90$530f8490@eden> Message-ID: This closes feature request 676401. I'm not an admin or I'd close it myself. Can you take care of that, Mark? - TimS 2/18/2003 10:21:53 PM, "Mark Hammond" wrote: >For those who don't read the checkins list, I just added the following: > >--- >Store our config files in the "correct" Windows directory, using the >SHGetFolderPath function to locate it. If we can't locate this, or can't >create our SpamBayes directory under this, we stick with the "application >directory". > >Code also exists to migrate your existing databases to this directory. >First time you run Outlook after this update, your .pck/.db files will be >*moved* to the new directory. Thus, no re-training should be necessary. > >About ready to release a stand-alone SpamBayes Outlook Plugin binary :) >--- > >If we ever get a platform.py, or decide that there should be a single >database for all spambayes "products" (ie, pop3 and Outlook sharing the same >db, for example), then I will be happy to migrate. > >Let me know if you have any problems. > >Mark. > > >_______________________________________________ >Spambayes mailing list >Spambayes@python.org >http://mail.python.org/mailman/listinfo/spambayes > > c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org From tim at fourstonesExpressions.com Wed Feb 19 08:23:09 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Wed Feb 19 09:23:18 2003 Subject: [Spambayes] training In-Reply-To: <000301c2d804$74d7be40$530f8490@eden> Message-ID: 2/19/2003 4:48:36 AM, "Mark Hammond" wrote: >> > Which, coincidently, leads us to what I have been advocating >> > for some time . >> >> :) >> >> > The core spambayes code should persist >> > the word database as now, but also a basic "message >> > database". >> >> Do you mean one like pop3proxy's cache? i.e. one that >> expires messages over a certain age? > >I actually just meant a simple msg_id->trained_as_spam dictionary - just a >memory that a message had previously been trained as ham/spam, so a need to >untrain and multiple requests for the same message can be detected. This is >user-proof in the face of I-double-click-everywhere type users This is a great idea. The filesystem based stuff (pop3proxy) will need to keep a permanent copy of mails that have been trained in order for this to work, but I don't have a problem with that. > >> > If this sounds OK, I've a further idea I will expand in email :) > >I meant to say "private email", but the list is quiet at the moment >... > >I was thinking that we could possibly abstract the database out one step >more. Have a single "database manager" that maintains a few 'databases' - >really just discrete tables, with no joins, in standard database parlance. >What I'm trying to get at is that if we could have 2 dictionaries (existing >word dictionary, plus one more "msg_id->how_was_trained") stored in a single >file, and maybe even the possibility of additional "application defined" >dictionaries (such as random config info) in that same file, life would be >pretty peachy :) > >If we talk in terms of pickles, imagine: >database['bayes'] = existing_bayes_pickle >database['training'] = dict_I_proposed_above >database['outlook_ui'] = dict_for_outlook_ui_options We might replace Options.py with a pickled dictionary pointed to by this dictionary. Or at least the user configurable stuff. The configurator for bayescustomize.ini is an enormous pain, and getting worse as I try to write 'installers' for various pop3 mailers. > >And 'database' is pickled. I see no reason this couldn't also work for >bsdbd. I am proposing that Corpus.py automatically manage the 'bayes' and >'training' keys of the database, but leave others for applications. Bayes >itself persists the entire database. Some naming convention would be just >fine too :) Very kewl ideas. Getting-over-my-God's-gift-to-opensourcedness-ly, TimS > >Never-satisfied-ly, > >Mark. > c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org From noreply at sourceforge.net Wed Feb 19 06:43:28 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Wed Feb 19 09:42:07 2003 Subject: [Spambayes] [ spambayes-Bugs-689298 ] Messages not processed Message-ID: Bugs item #689298, was opened at 2003-02-19 09:43 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=689298&group_id=61702 Category: Outlook Group: None Status: Open Resolution: None Priority: 5 Submitted By: Jacques Benoit (jbenoit) Assigned to: Nobody/Anonymous (nobody) Summary: Messages not processed Initial Comment: Using Spambayes 1.0a1 and Outlook Plug-in Using Python 2.2.2 Some email messages are not processed correctly. The Outlook buttons "Delete as Spam" and "Recover from Spam" have no effect. A Pythonwin trace follows. An email message is provided. ===== Deleting and spam training message 'Lose 22.5lbs in 3 weeks for FREE! ' - FAILED to create email.message from: 'Received: from maili41.mxdat.org (ms5.mxdat.com [209.236.58.41]) by cybqc07.cybectec.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2653.13)\r\n\tid 1B84HXQH; Wed, 19 Feb 2003 07:13:59 -0500\r\nTo: jbenoit@cybectec.com\r\nDate: Wed, 19 Feb 2003 07:19:15 -0500\r\nMessage-ID: <1045657155.5456@green5>\r\nX-Mailer: Pine.GSO.4.31\r\nFrom: "Get Serious" \r\nReturn-Path: \r\nReply-To: \r\nSubject: Lose 22.5lbs in 3 weeks for FREE!\r\n \r\n************************************************** ********************\r\nPLEASE DO NOT REPLY TO THIS EMAIL - To unsubscribe, please see the\r\nsubscription management section at the bottom of this newsletter.\r\n*************************************** *******************************\r\n\n\n\r\n"I Couldn\'t Face Another Holiday Being Called the \'FAT ONE\'... \r\nThank God I Found Apple Cider Vinegar Enhanced!" \r\n\r\nGet Your Free Bottle & SEE FOR YOURSELF! \r\n\r\nhttp://209.236.60.3/lc1/go.php?10 \r\n\r\nCLICK HERE FOR DETAILS! \r\nNo Crash Diets! No Painful Excercise! \r\n\r\nhttp://209.236.60.3/lc1/go.php?10 \r\n\r\n=========================== ========================== \r\n\r\n\r\nEnjoy your day,\r\n\r\n\r\nDaily Max Deal Chopper\r\n\r\n\r\n######################## ######################################### #####\r\nIf you no longer wish to receive your edition of the Daily Max Deal Chop \r\nNewsletter, please follow the link below and follow the simple \r\nunsubscribe instructions.\r\n\r\nhttp://209.236.60.3/unsub.ht m\r\n\r\nThe use and unauthorized reproduction of this message and delivery header \r\ninformation is strictly prohibited. This e-mail is meant for informational \r\npurposes only. JudoMonkey makes no guarantees in connection with the \r\nproduct(s) or service(s) presented.\r\n############################# ######################################### \r\n\r\nworabvg^plorpgrp(pbz\r\n' pythoncom error: Python error invoking COM method. Traceback (most recent call last): File "D:\PROGRA~1\Python22\lib\site- packages\win32com\server\policy.py", line 275, in _Invoke_ return self._invoke_(dispid, lcid, wFlags, args) File "D:\PROGRA~1\Python22\lib\site- packages\win32com\server\policy.py", line 280, in _invoke_ return S_OK, -1, self._invokeex_(dispid, lcid, wFlags, args, None, None) File "D:\PROGRA~1\Python22\lib\site- packages\win32com\server\policy.py", line 510, in _invokeex_ return apply(func, args) File "D:\Program Files\Spambayes\spambayes- 1.0a1\Outlook2000\addin.py", line 315, in OnClick if train.train_message(msgstore_message, True, self.manager, rescore = True): File "D:\Program Files\Spambayes\spambayes- 1.0a1\Outlook2000\train.py", line 43, in train_message stream = msg.GetEmailPackageObject() File "D:\Program Files\Spambayes\spambayes- 1.0a1\Outlook2000\msgstore.py", line 565, in GetEmailPackageObject msg = email.message_from_string(text) File "D:\PROGRA~1\Python22 \Lib\email\__init__.py", line 52, in message_from_string return Parser(_class, strict=strict).parsestr(s) File "D:\PROGRA~1\Python22 \Lib\email\Parser.py", line 75, in parsestr return self.parse(StringIO(text), headersonly=headersonly) File "D:\PROGRA~1\Python22 \Lib\email\Parser.py", line 62, in parse self._parseheaders(root, fp) File "D:\PROGRA~1\Python22 \Lib\email\Parser.py", line 128, in _parseheaders raise Errors.HeaderParseError( email.Errors.HeaderParseError: Not a header, not a continuation: ``**************************************************** ******************'' ===== ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=689298&group_id=61702 From tim.one at comcast.net Wed Feb 19 10:01:44 2003 From: tim.one at comcast.net (Tim Peters) Date: Wed Feb 19 10:02:13 2003 Subject: [Spambayes] OUCH! In-Reply-To: Message-ID: [David LeBlanc] > I guess there's a first time for everything. In using 3 different > versions of Outlook, this is the first time I've ever seen it pop up > an "illegal operation - program terminating" dialog! OL died very fast > after the "ok" button was pressed too. I've seen it under Outlook 2000 ever since I started using it, but it's indeed rare. It has seemed to me to be correlated with other programs accessing the network at the same time OL is trying to connect to a server. > It took a LONG time to rebuild it's indices or whatever it is that > Outlook does when it's restarted after an abrupt shutdown. That part's a mystery too. If you run scanpst.exe ("Inbox Repair Tool") on your PST file(s) after such a crash, you'll find that OL *still* takes forever to start up again, but scanpst should have rebuilt the indices (if needed). > Spammie came back though, so OL probably didn't think it was Spammie's > fault (I guess). > > Alas, there where no discernable logs I could find to figure out what > went *splat*. Nope, there never are. > The only thing that has changed in a long time is the addition of > spammie... Impossible to guess, but for me it's happened with OL2K both before and after installing the plugin. I haven't noticed any increase or decrease in OL problems. From noreply at sourceforge.net Wed Feb 19 07:08:33 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Wed Feb 19 10:02:39 2003 Subject: [Spambayes] [ spambayes-Bugs-689298 ] Messages not processed Message-ID: Bugs item #689298, was opened at 2003-02-19 08:43 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=689298&group_id=61702 Category: Outlook Group: None Status: Open Resolution: None Priority: 5 Submitted By: Jacques Benoit (jbenoit) >Assigned to: Mark Hammond (mhammond) Summary: Messages not processed Initial Comment: Using Spambayes 1.0a1 and Outlook Plug-in Using Python 2.2.2 Some email messages are not processed correctly. The Outlook buttons "Delete as Spam" and "Recover from Spam" have no effect. A Pythonwin trace follows. An email message is provided. ===== Deleting and spam training message 'Lose 22.5lbs in 3 weeks for FREE! ' - FAILED to create email.message from: 'Received: from maili41.mxdat.org (ms5.mxdat.com [209.236.58.41]) by cybqc07.cybectec.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2653.13)\r\n\tid 1B84HXQH; Wed, 19 Feb 2003 07:13:59 -0500\r\nTo: jbenoit@cybectec.com\r\nDate: Wed, 19 Feb 2003 07:19:15 -0500\r\nMessage-ID: <1045657155.5456@green5>\r\nX-Mailer: Pine.GSO.4.31\r\nFrom: "Get Serious" \r\nReturn-Path: \r\nReply-To: \r\nSubject: Lose 22.5lbs in 3 weeks for FREE!\r\n \r\n************************************************** ********************\r\nPLEASE DO NOT REPLY TO THIS EMAIL - To unsubscribe, please see the\r\nsubscription management section at the bottom of this newsletter.\r\n*************************************** *******************************\r\n\n\n\r\n"I Couldn\'t Face Another Holiday Being Called the \'FAT ONE\'... \r\nThank God I Found Apple Cider Vinegar Enhanced!" \r\n\r\nGet Your Free Bottle & SEE FOR YOURSELF! \r\n\r\nhttp://209.236.60.3/lc1/go.php?10 \r\n\r\nCLICK HERE FOR DETAILS! \r\nNo Crash Diets! No Painful Excercise! \r\n\r\nhttp://209.236.60.3/lc1/go.php?10 \r\n\r\n=========================== ========================== \r\n\r\n\r\nEnjoy your day,\r\n\r\n\r\nDaily Max Deal Chopper\r\n\r\n\r\n######################## ######################################### #####\r\nIf you no longer wish to receive your edition of the Daily Max Deal Chop \r\nNewsletter, please follow the link below and follow the simple \r\nunsubscribe instructions.\r\n\r\nhttp://209.236.60.3/unsub.ht m\r\n\r\nThe use and unauthorized reproduction of this message and delivery header \r\ninformation is strictly prohibited. This e-mail is meant for informational \r\npurposes only. JudoMonkey makes no guarantees in connection with the \r\nproduct(s) or service(s) presented.\r\n############################# ######################################### \r\n\r\nworabvg^plorpgrp(pbz\r\n' pythoncom error: Python error invoking COM method. Traceback (most recent call last): File "D:\PROGRA~1\Python22\lib\site- packages\win32com\server\policy.py", line 275, in _Invoke_ return self._invoke_(dispid, lcid, wFlags, args) File "D:\PROGRA~1\Python22\lib\site- packages\win32com\server\policy.py", line 280, in _invoke_ return S_OK, -1, self._invokeex_(dispid, lcid, wFlags, args, None, None) File "D:\PROGRA~1\Python22\lib\site- packages\win32com\server\policy.py", line 510, in _invokeex_ return apply(func, args) File "D:\Program Files\Spambayes\spambayes- 1.0a1\Outlook2000\addin.py", line 315, in OnClick if train.train_message(msgstore_message, True, self.manager, rescore = True): File "D:\Program Files\Spambayes\spambayes- 1.0a1\Outlook2000\train.py", line 43, in train_message stream = msg.GetEmailPackageObject() File "D:\Program Files\Spambayes\spambayes- 1.0a1\Outlook2000\msgstore.py", line 565, in GetEmailPackageObject msg = email.message_from_string(text) File "D:\PROGRA~1\Python22 \Lib\email\__init__.py", line 52, in message_from_string return Parser(_class, strict=strict).parsestr(s) File "D:\PROGRA~1\Python22 \Lib\email\Parser.py", line 75, in parsestr return self.parse(StringIO(text), headersonly=headersonly) File "D:\PROGRA~1\Python22 \Lib\email\Parser.py", line 62, in parse self._parseheaders(root, fp) File "D:\PROGRA~1\Python22 \Lib\email\Parser.py", line 128, in _parseheaders raise Errors.HeaderParseError( email.Errors.HeaderParseError: Not a header, not a continuation: ``**************************************************** ******************'' ===== ---------------------------------------------------------------------- >Comment By: Tim Stone (timstone4) Date: 2003-02-19 09:08 Message: Logged In: YES user_id=645698 Might this have been fixed in alpha 2? ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=689298&group_id=61702 From noreply at sourceforge.net Wed Feb 19 07:47:08 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Wed Feb 19 10:40:10 2003 Subject: [Spambayes] [ spambayes-Bugs-689298 ] Messages not processed Message-ID: Bugs item #689298, was opened at 2003-02-19 08:43 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=689298&group_id=61702 Category: Outlook Group: None Status: Open Resolution: None Priority: 5 Submitted By: Jacques Benoit (jbenoit) Assigned to: Mark Hammond (mhammond) Summary: Messages not processed Initial Comment: Using Spambayes 1.0a1 and Outlook Plug-in Using Python 2.2.2 Some email messages are not processed correctly. The Outlook buttons "Delete as Spam" and "Recover from Spam" have no effect. A Pythonwin trace follows. An email message is provided. ===== Deleting and spam training message 'Lose 22.5lbs in 3 weeks for FREE! ' - FAILED to create email.message from: 'Received: from maili41.mxdat.org (ms5.mxdat.com [209.236.58.41]) by cybqc07.cybectec.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2653.13)\r\n\tid 1B84HXQH; Wed, 19 Feb 2003 07:13:59 -0500\r\nTo: jbenoit@cybectec.com\r\nDate: Wed, 19 Feb 2003 07:19:15 -0500\r\nMessage-ID: <1045657155.5456@green5>\r\nX-Mailer: Pine.GSO.4.31\r\nFrom: "Get Serious" \r\nReturn-Path: \r\nReply-To: \r\nSubject: Lose 22.5lbs in 3 weeks for FREE!\r\n \r\n************************************************** ********************\r\nPLEASE DO NOT REPLY TO THIS EMAIL - To unsubscribe, please see the\r\nsubscription management section at the bottom of this newsletter.\r\n*************************************** *******************************\r\n\n\n\r\n"I Couldn\'t Face Another Holiday Being Called the \'FAT ONE\'... \r\nThank God I Found Apple Cider Vinegar Enhanced!" \r\n\r\nGet Your Free Bottle & SEE FOR YOURSELF! \r\n\r\nhttp://209.236.60.3/lc1/go.php?10 \r\n\r\nCLICK HERE FOR DETAILS! \r\nNo Crash Diets! No Painful Excercise! \r\n\r\nhttp://209.236.60.3/lc1/go.php?10 \r\n\r\n=========================== ========================== \r\n\r\n\r\nEnjoy your day,\r\n\r\n\r\nDaily Max Deal Chopper\r\n\r\n\r\n######################## ######################################### #####\r\nIf you no longer wish to receive your edition of the Daily Max Deal Chop \r\nNewsletter, please follow the link below and follow the simple \r\nunsubscribe instructions.\r\n\r\nhttp://209.236.60.3/unsub.ht m\r\n\r\nThe use and unauthorized reproduction of this message and delivery header \r\ninformation is strictly prohibited. This e-mail is meant for informational \r\npurposes only. JudoMonkey makes no guarantees in connection with the \r\nproduct(s) or service(s) presented.\r\n############################# ######################################### \r\n\r\nworabvg^plorpgrp(pbz\r\n' pythoncom error: Python error invoking COM method. Traceback (most recent call last): File "D:\PROGRA~1\Python22\lib\site- packages\win32com\server\policy.py", line 275, in _Invoke_ return self._invoke_(dispid, lcid, wFlags, args) File "D:\PROGRA~1\Python22\lib\site- packages\win32com\server\policy.py", line 280, in _invoke_ return S_OK, -1, self._invokeex_(dispid, lcid, wFlags, args, None, None) File "D:\PROGRA~1\Python22\lib\site- packages\win32com\server\policy.py", line 510, in _invokeex_ return apply(func, args) File "D:\Program Files\Spambayes\spambayes- 1.0a1\Outlook2000\addin.py", line 315, in OnClick if train.train_message(msgstore_message, True, self.manager, rescore = True): File "D:\Program Files\Spambayes\spambayes- 1.0a1\Outlook2000\train.py", line 43, in train_message stream = msg.GetEmailPackageObject() File "D:\Program Files\Spambayes\spambayes- 1.0a1\Outlook2000\msgstore.py", line 565, in GetEmailPackageObject msg = email.message_from_string(text) File "D:\PROGRA~1\Python22 \Lib\email\__init__.py", line 52, in message_from_string return Parser(_class, strict=strict).parsestr(s) File "D:\PROGRA~1\Python22 \Lib\email\Parser.py", line 75, in parsestr return self.parse(StringIO(text), headersonly=headersonly) File "D:\PROGRA~1\Python22 \Lib\email\Parser.py", line 62, in parse self._parseheaders(root, fp) File "D:\PROGRA~1\Python22 \Lib\email\Parser.py", line 128, in _parseheaders raise Errors.HeaderParseError( email.Errors.HeaderParseError: Not a header, not a continuation: ``**************************************************** ******************'' ===== ---------------------------------------------------------------------- >Comment By: Tim Stone (timstone4) Date: 2003-02-19 09:47 Message: Logged In: YES user_id=645698 Great. Thanks for helping out here. ---------------------------------------------------------------------- Comment By: Jacques Benoit (jbenoit) Date: 2003-02-19 09:46 Message: Logged In: YES user_id=715810 Installed alpha 2. Same error in the Python... File "D:\PROGRA~1\Python22\Lib\email\Parser.py", line 128, in _parseheaders raise Errors.HeaderParseError( email.Errors.HeaderParseError: Not a header, not a continuation: ---------------------------------------------------------------------- Comment By: Tim Stone (timstone4) Date: 2003-02-19 09:08 Message: Logged In: YES user_id=645698 Might this have been fixed in alpha 2? ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=689298&group_id=61702 From noreply at sourceforge.net Wed Feb 19 07:46:08 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Wed Feb 19 10:45:36 2003 Subject: [Spambayes] [ spambayes-Bugs-689298 ] Messages not processed Message-ID: Bugs item #689298, was opened at 2003-02-19 09:43 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=689298&group_id=61702 Category: Outlook Group: None Status: Open Resolution: None Priority: 5 Submitted By: Jacques Benoit (jbenoit) Assigned to: Mark Hammond (mhammond) Summary: Messages not processed Initial Comment: Using Spambayes 1.0a1 and Outlook Plug-in Using Python 2.2.2 Some email messages are not processed correctly. The Outlook buttons "Delete as Spam" and "Recover from Spam" have no effect. A Pythonwin trace follows. An email message is provided. ===== Deleting and spam training message 'Lose 22.5lbs in 3 weeks for FREE! ' - FAILED to create email.message from: 'Received: from maili41.mxdat.org (ms5.mxdat.com [209.236.58.41]) by cybqc07.cybectec.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2653.13)\r\n\tid 1B84HXQH; Wed, 19 Feb 2003 07:13:59 -0500\r\nTo: jbenoit@cybectec.com\r\nDate: Wed, 19 Feb 2003 07:19:15 -0500\r\nMessage-ID: <1045657155.5456@green5>\r\nX-Mailer: Pine.GSO.4.31\r\nFrom: "Get Serious" \r\nReturn-Path: \r\nReply-To: \r\nSubject: Lose 22.5lbs in 3 weeks for FREE!\r\n \r\n************************************************** ********************\r\nPLEASE DO NOT REPLY TO THIS EMAIL - To unsubscribe, please see the\r\nsubscription management section at the bottom of this newsletter.\r\n*************************************** *******************************\r\n\n\n\r\n"I Couldn\'t Face Another Holiday Being Called the \'FAT ONE\'... \r\nThank God I Found Apple Cider Vinegar Enhanced!" \r\n\r\nGet Your Free Bottle & SEE FOR YOURSELF! \r\n\r\nhttp://209.236.60.3/lc1/go.php?10 \r\n\r\nCLICK HERE FOR DETAILS! \r\nNo Crash Diets! No Painful Excercise! \r\n\r\nhttp://209.236.60.3/lc1/go.php?10 \r\n\r\n=========================== ========================== \r\n\r\n\r\nEnjoy your day,\r\n\r\n\r\nDaily Max Deal Chopper\r\n\r\n\r\n######################## ######################################### #####\r\nIf you no longer wish to receive your edition of the Daily Max Deal Chop \r\nNewsletter, please follow the link below and follow the simple \r\nunsubscribe instructions.\r\n\r\nhttp://209.236.60.3/unsub.ht m\r\n\r\nThe use and unauthorized reproduction of this message and delivery header \r\ninformation is strictly prohibited. This e-mail is meant for informational \r\npurposes only. JudoMonkey makes no guarantees in connection with the \r\nproduct(s) or service(s) presented.\r\n############################# ######################################### \r\n\r\nworabvg^plorpgrp(pbz\r\n' pythoncom error: Python error invoking COM method. Traceback (most recent call last): File "D:\PROGRA~1\Python22\lib\site- packages\win32com\server\policy.py", line 275, in _Invoke_ return self._invoke_(dispid, lcid, wFlags, args) File "D:\PROGRA~1\Python22\lib\site- packages\win32com\server\policy.py", line 280, in _invoke_ return S_OK, -1, self._invokeex_(dispid, lcid, wFlags, args, None, None) File "D:\PROGRA~1\Python22\lib\site- packages\win32com\server\policy.py", line 510, in _invokeex_ return apply(func, args) File "D:\Program Files\Spambayes\spambayes- 1.0a1\Outlook2000\addin.py", line 315, in OnClick if train.train_message(msgstore_message, True, self.manager, rescore = True): File "D:\Program Files\Spambayes\spambayes- 1.0a1\Outlook2000\train.py", line 43, in train_message stream = msg.GetEmailPackageObject() File "D:\Program Files\Spambayes\spambayes- 1.0a1\Outlook2000\msgstore.py", line 565, in GetEmailPackageObject msg = email.message_from_string(text) File "D:\PROGRA~1\Python22 \Lib\email\__init__.py", line 52, in message_from_string return Parser(_class, strict=strict).parsestr(s) File "D:\PROGRA~1\Python22 \Lib\email\Parser.py", line 75, in parsestr return self.parse(StringIO(text), headersonly=headersonly) File "D:\PROGRA~1\Python22 \Lib\email\Parser.py", line 62, in parse self._parseheaders(root, fp) File "D:\PROGRA~1\Python22 \Lib\email\Parser.py", line 128, in _parseheaders raise Errors.HeaderParseError( email.Errors.HeaderParseError: Not a header, not a continuation: ``**************************************************** ******************'' ===== ---------------------------------------------------------------------- >Comment By: Jacques Benoit (jbenoit) Date: 2003-02-19 10:46 Message: Logged In: YES user_id=715810 Installed alpha 2. Same error in the Python... File "D:\PROGRA~1\Python22\Lib\email\Parser.py", line 128, in _parseheaders raise Errors.HeaderParseError( email.Errors.HeaderParseError: Not a header, not a continuation: ---------------------------------------------------------------------- Comment By: Tim Stone (timstone4) Date: 2003-02-19 10:08 Message: Logged In: YES user_id=645698 Might this have been fixed in alpha 2? ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=689298&group_id=61702 From tim at fourstonesExpressions.com Wed Feb 19 09:46:39 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Wed Feb 19 10:46:48 2003 Subject: [Spambayes] Use of email package Message-ID: It appears from the bug report that Jacques has submitted, and from other errors I've seen in the pop3proxy lately, that the use of the email package is causing us some occasional trouble. The email package makes assumptions about the well-formedness of mail, and throws exceptions if those assumptions do not appear to apply to the mail being dealt with. I feel that this represents a fairly serious weakness in our solution, because spammers can exploit those assumptions to break the filter, thus discrediting it and making it harder for users to see their mail, and at the same time getting their spam through when users really do see their mail. We've got to either seriously harden our code so it knows what to do when the email package raises an exception, or consider not using the email package. I think I'll be reworking pop3proxy so that it no longer uses the email package for anything. The Corpus stuff currently has most (all?) the function that is needed by pop3proxy anyway. c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org From francois.granger at free.fr Wed Feb 19 17:59:07 2003 From: francois.granger at free.fr (Fran=?ISO-8859-1?B?5w==?=ois Granger) Date: Wed Feb 19 11:59:16 2003 Subject: [Spambayes] Use of email package In-Reply-To: Message-ID: on 19/02/03 16:46, Tim Stone - Four Stones Expressions at tim@fourstonesExpressions.com wrote: > It appears from the bug report that Jacques has submitted, and from other > errors I've seen in the pop3proxy lately, that the use of the email package is > causing us some occasional trouble. There was words that version 2.4.3 minimum was needed and that this version was not available to Python prior to version 2.3 (?). This is something the bug reporters have to check. I know that I manually installed the 2.4.3 on my Python 2.2.2, not on the 2.2. Python 2.2.2 (#138, Oct 25 2002, 23:10:42) [CW CARBON GUSI2 THREADS GC] on mac Type "copyright", "credits" or "license" for more information. >>> import email >>> email.__version__ '2.4.3' >>> Python 2.2 (#124, Dec 22 2001, 17:36:16) [CW PPC GUSI2 THREADS GC] on mac Type "copyright", "credits" or "license" for more information. >>> import email >>> email.__version__ '1.0' >>> -- Le courrier est un moyen de communication. Les gens devraient se poser des questions sur les implications politiques des choix (ou non choix) de leurs outils et technologies. Pour des courriers propres : -- From neale at woozle.org Wed Feb 19 09:12:38 2003 From: neale at woozle.org (Neale Pickett) Date: Wed Feb 19 12:13:06 2003 Subject: [Spambayes] training In-Reply-To: (Tim Stone - Four Stones Expressions's message of "Wed, 19 Feb 2003 07:11:15 -0600") References: Message-ID: Tim Stone - Four Stones Expressions writes: > The problem here is that some mailers pretty much lose most of the > headers when you do a forward operation That is exactly the problem. Also, consider the company that gets gigabytes of email every day. How long do they keep a message in their pool for future training? And anyway, forwarding a message to a special address is still too much work. I know that sounds ludicrous, especially when your aunt is constantly forwarding you inspirational messages about leprechauns, but most people at work would rather just junk the spam than take the time to forward it to a special address. Even if you promise them that after a week they won't have to do it as often. Especially then, actually. We have to make this dead simple or it's not going to get used. Now, if you're sending suspected spam to an *administrator*, I think you can get away with the "forward to special address" idea. But then there's no reason to store mail anywhere, since we can (presumably) trust the administrator to send back the original message, unadulterated. In fact, we could bundle all spam messages up as an attachment, and then the admin can just forward back the attachment. Does outlook mangle message/rfc822 attachments? This doesn't deal with the problem of false negatives, but maybe a few dedicated end-users (as opposed to end-admins) would send in enough false negatives to make it all work. Neale From barry at python.org Wed Feb 19 12:18:07 2003 From: barry at python.org (Barry A. Warsaw) Date: Wed Feb 19 12:18:38 2003 Subject: [Spambayes] Use of email package References: Message-ID: <15955.48207.421755.891103@gargle.gargle.HOWL> >>>>> "TS" == Tim Stone writes: TS> We've got to either seriously harden our code so it knows what TS> to do when the email package raises an exception, or consider TS> not using the email package. I think I'll be reworking TS> pop3proxy so that it no longer uses the email package for TS> anything. The Corpus stuff currently has most (all?) the TS> function that is needed by pop3proxy anyway. Let me take this opportunity to elaborate on the architecture of the email package. There was a deliberate separation between the representation of email messages and the parsing of flat text to that object model (and in generating flat text from the object model, but that may not be relevant). Thus, it was designed with an eye toward the use of application specific parsers, and it may well be that the default parsers (both the strict and the lax parsers) may not be appropriate for an application that tends to see intentionally ill-formed messages. My suggestion would be to write a parser that can handle the really bad messages, then use the default lax parser for most things, and fall back to the "adaptive parser" for the really horrendous messages. Then donate that parser back to Python. -Barry From jm at jmason.org Wed Feb 19 17:08:54 2003 From: jm at jmason.org (Justin Mason) Date: Wed Feb 19 12:20:16 2003 Subject: [Spambayes] Use of email package In-Reply-To: Message from Tim Stone - Four Stones Expressions Message-ID: <20030219170859.0AE9C16F19@jmason.org> Tim Stone - Four Stones Expressions said: > We've got to either seriously harden our code so it knows what to do > when the email package raises an exception, or consider not using the > email package. I think I'll be reworking pop3proxy so that it no longer > uses the email package for anything. The Corpus stuff currently has > most (all?) the function that i s needed by pop3proxy anyway. Folks -- quick note from the SpamAssassin side of things -- we found we had to write our own, alright. You've gotta be supremely defensive about how ill-formed the mail could be, plus some of those make great spam-signs too. Most "normal" mail pkgs assume some "safe" assumptions either for efficiency or easy code, but with a spam filter, you're in unsafe territory anyway ;) --j. From tim at fourstonesExpressions.com Wed Feb 19 11:23:46 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Wed Feb 19 12:23:58 2003 Subject: [Spambayes] training In-Reply-To: Message-ID: 2/19/2003 11:12:38 AM, Neale Pickett wrote: >Tim Stone - Four Stones Expressions writes: > >> The problem here is that some mailers pretty much lose most of the >> headers when you do a forward operation > >That is exactly the problem. Also, consider the company that gets >gigabytes of email every day. How long do they keep a message in their >pool for future training? > >And anyway, forwarding a message to a special address is still too much >work. Argh. I know you're right, but I don't *want* you to be right... > I know that sounds ludicrous, especially when your aunt is >constantly forwarding you inspirational messages about leprechauns, but >most people at work would rather just junk the spam than take the time >to forward it to a special address. Outlook plugin enables this behavior. Can we assume that the only people who use pop3 are those who are a bit higher on the computer user foodchain than the norm for Outlook users? (I know, I know, Tim Peters uses Outlook, too...) If so, maybe they'll accept a bit more behavioral expectation... > Even if you promise them that after >a week they won't have to do it as often. Especially then, actually. >We have to make this dead simple or it's not going to get used. Absolutely. As things are right now, it's not useable by anyone but people like us, which as dismaying as that may be, is not the norm. > >Now, if you're sending suspected spam to an *administrator*, I think you We can call the special address "spamadmin@myhost.com" You *can* fool some of the people all of the time (A. Lincoln) >can get away with the "forward to special address" idea. But then >there's no reason to store mail anywhere, since we can (presumably) >trust the administrator to send back the original message, >unadulterated. In fact, we could bundle all spam messages up as an >attachment, and then the admin can just forward back the attachment. >Does outlook mangle message/rfc822 attachments? > >This doesn't deal with the problem of false negatives, but maybe a few >dedicated end-users (as opposed to end-admins) would send in enough >false negatives to make it all work. > >Neale > >_______________________________________________ >Spambayes mailing list >Spambayes@python.org >http://mail.python.org/mailman/listinfo/spambayes > > c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org From popiel at wolfskeep.com Wed Feb 19 09:53:48 2003 From: popiel at wolfskeep.com (T. Alexander Popiel) Date: Wed Feb 19 12:53:53 2003 Subject: [Spambayes] training WAS: aging information In-Reply-To: Message from "Moore, Paul" <16E1010E4581B049ABC51D4975CEDB880113D91A@UKDCX001.uk.int.atosorigin.com> References: <16E1010E4581B049ABC51D4975CEDB880113D91A@UKDCX001.uk.int.atosorigin.com> Message-ID: <20030219175348.E63632DEAD@cashew.wolfskeep.com> In message: <16E1010E4581B049ABC51D4975CEDB880113D91A@UKDCX001.uk.int.atosorig in.com> "Moore, Paul" writes: >From: D. R. Evans [mailto:N7DR@arrisi.com] >> I saw a comment in the LJ article that one should train on roughly >> equal numbers of spam and ham. Is this actually true? (This question >> of course merely demonstrates that I'm too lazy to do the maths myself.) > >That's something I'd be interested in, too - particularly as the >ham:spam ratio people get is utterly out of their control. I'm also >too lazy - or possibly incompetent - to do the maths, but IIRC, there >were some experiments done at one stage. A pointer to the relevant posts >(or better still, a summary on the website) would be very useful. I was the one who did the bulk of the ratio experiments, and I posted my results at http://www.wolfskeep.com/~popiel/spambayes. One thing to note about the experiments: in them, I varied not only the ratios of the training set, but also the ratios of the testing set. This is not particularly realistic for gauging the effect of mangling the ratio of training for some particular person's live feed (where the testing ratio would remain constant). It would be worthwhile to rerun similar experiments with current versions of the code, too. - Alex From francois.granger at free.fr Wed Feb 19 21:07:48 2003 From: francois.granger at free.fr (Francois Granger) Date: Wed Feb 19 15:07:54 2003 Subject: [Spambayes] training In-Reply-To: References: Message-ID: At 11:23 -0600 19/02/2003, in message Re: [Spambayes] training, Tim Stone - Four Stones Expressions wrote: >2/19/2003 11:12:38 AM, Neale Pickett wrote: > >>Tim Stone - Four Stones Expressions >writes: > > >And anyway, forwarding a message to a special address is still too >much >>work. > >Argh. I know you're right, but I don't *want* you to be right... i'm afraid, you have to ;-) >(I know, I know, >Tim Peters uses Outlook, too...) Nobody's perfect.... > >We have to make this dead simple or it's not going to get used. > >Absolutely. As things are right now, it's not useable by anyone but >people like us, which as dismaying as that may be, is not the norm. Glad that somebody is lucid on this list, at least ;-) Anyway, if SpamBayes is supposed to reach the "mass", it have to become dead simple. And this is not simple. Why not concentrate, one step at a time on simplifying it so that it reach audiences less and less literate _progressively_. Maybe a matrix of mail readers(Operating system)/setup instruction will clarify at which step of this progression the product is currently and what are the logical next steps. In this case, OE integration is to be putted on some next step... to be taken sometime in the future. And maybe Eudora Windows get a higher priority because somebody willing to choose it is more supposed to be curious about new/exotic technology.... Anyway, a unification of databases/storages as suggested in another thread would be a good step forward for computer admin/power users so that they could safely recommend the product. Another step would be to get close contacts with http://www.osafoundation.org/ throught Kevin Altis eventually because they will be happy to have this technology in their first shipping release, and integration will be a dream for them. -- Hofstadter's Law : It always takes longer than you expect, even when you take into account Hofstadter's Law. From zander at zan.com Wed Feb 19 10:46:07 2003 From: zander at zan.com (Zander) Date: Wed Feb 19 15:31:36 2003 Subject: [Spambayes] Other pop3proxy options Message-ID: <039101c2d847$3b1eff40$a100a8c0@zlichstein> I would like to extend the options for how disposition is identified by the pop3proxy implementation. In particular, I would like the option of A. X-Spambayes-Classification: as now B. To: XXXXX as is in CVS now C. Subject line munging to append Is there any reason that was not included? (beside the obvious potential for a spammer to slip in a workaround) I use Outlook Express, and obviously can't use the arbitrary header technique - and am most interested in adding a [***SPAM***] header so that I can correctly bucketize those messages - but leave [***UNSURE***] in my primary box, and not molest ham messages at all. Is there any reason not to do this? Would you accept it if I did? Is there any reason why you aren't using the email module Parser API to crack the headers? I have found a certain number of messages are not parsed correctly by the re that you are using. They show up as From: (none) Subj: (none) in the UI - but I haven't determined why just yet (though I can see that some part of the message is getting stuck with the header by your re.split(r'\n\r?\n', messageText, 1) expression. - Z From tim at fourstonesExpressions.com Wed Feb 19 14:36:22 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Wed Feb 19 16:29:45 2003 Subject: [Spambayes] Other pop3proxy options In-Reply-To: <039101c2d847$3b1eff40$a100a8c0@zlichstein> Message-ID: <5Y71UQUR61WXSGF08IYTSMVQ87RMKI.3e53eac6@myst> 2/19/2003 12:46:07 PM, "Zander" wrote: >I would like to extend the options for how disposition is identified by the pop3proxy implementation. In particular, I would like the option of > >A. X-Spambayes-Classification: as now >B. To: XXXXX as is in CVS now >C. Subject line munging to append > >Is there any reason that was not included? (beside the obvious potential for a spammer to slip in a workaround) I use Outlook Express, and obviously can't use the arbitrary header technique - and am most interested in adding a [***SPAM***] header so that I can correctly bucketize those messages - but leave [***UNSURE***] in my primary box, and not molest ham messages at all. > >Is there any reason not to do this? Would you accept it if I did? Is there any reason why you aren't using the email module Parser API to crack the headers? Subject munging will be simple to add, and I can do it. Stay tuned. I have found a certain number of messages are not parsed correctly by the re that you are using. They show up as From: (none) Subj: (none) in the UI We've recently seen some problems with malformed mails. We're examining this issue (see the email package use thread) - but I haven't determined why just yet (though I can see that some part of the message is getting stuck with the header by your re.split(r'\n\r?\n', messageText, 1) expression. > >- Z >_______________________________________________ >Spambayes mailing list >Spambayes@python.org >http://mail.python.org/mailman/listinfo/spambayes > > c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org From T.A.Meyer at massey.ac.nz Thu Feb 20 11:18:38 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Wed Feb 19 17:19:21 2003 Subject: [Spambayes] A few code questions (Outlook oriented) Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1318D4F2@its-xchg4.massey.ac.nz> > I would like to fix the read status change problem. I would > like to know the following: While you're fixing the problem, do you want to go ahead and add my feature request? Check the config for the appropriate flag when activating the "delete as spam" button, and iff it is set, don't leave the unread status the same, but mark as read. (Something like: if self.manager.config.delete_as_spam_marks_as_read == True: for the check). I have been meaning to do this for quite some time now, and have even half-coded it, but I run into trouble setting the read flag and then got caught up with discussions (offlist) about options in general with Mark. Thanks, Tony Meyer From tim at fourstonesExpressions.com Wed Feb 19 16:26:41 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Wed Feb 19 17:26:50 2003 Subject: [Spambayes] Other pop3proxy options In-Reply-To: <04f401c2d863$22194ac0$a100a8c0@zlichstein> Message-ID: Great set of requirements! I'll make it so asap. Watch the checkin list... - TimS 2/19/2003 4:06:02 PM, "Zander" wrote: >I think this will be necessary to change - because I can't seem to construct >a filter in OE6 that will use your current To: ; ... munging >strategy to classify my spam properly. [Not only that, but a "Reply To All" >gets awkward...] > >Your notes say something about "contains 'spam' followed by a comma" - but >that doesn't work in OE. I tried contains "spam;" [followed by a >semicolon - which is the address separator], but that too doesn't work. If >I use "contains 'spam'" - obviously that's wrong because it pulls, for >example, the spambayes mailinglist posts ;-) > >So... I think that the > > a. To: munging doesn't work as written now in OE6 AFAICT. > b. There should be the option of To:, Subject; [or CC:, or whatever] >munging (why not make it a variable) > c. The replace token should be configurable: ie: , [***SPAM***], >or [!@#Mycust0m Flag*&*&%]. > i. The reason for including the above is that *I* would >prefer to leave 'ham' untouched, while marking SPAM obviously, and probably >marking unsure subtly. > >- Z >----- Original Message ----- >From: "Tim Stone - Four Stones Expressions" >To: ; ; "Zander" >Sent: Wednesday, February 19, 2003 12:36 PM >Subject: Re: [Spambayes] Other pop3proxy options > > >> 2/19/2003 12:46:07 PM, "Zander" wrote: >> >> >I would like to extend the options for how disposition is identified by >the >> pop3proxy implementation. In particular, I would like the option of >> > >> >A. X-Spambayes-Classification: as now >> >B. To: XXXXX as is in CVS now >> >C. Subject line munging to append >> > >> >Is there any reason that was not included? (beside the obvious potential >for >> a spammer to slip in a workaround) I use Outlook Express, and obviously >can't >> use the arbitrary header technique - and am most interested in adding a >> [***SPAM***] header so that I can correctly bucketize those messages - but >> leave [***UNSURE***] in my primary box, and not molest ham messages at >all. >> > >> >Is there any reason not to do this? Would you accept it if I did? Is >there >> any reason why you aren't using the email module Parser API to crack the >> headers? >> >> Subject munging will be simple to add, and I can do it. Stay tuned. >> >> I have found a certain number of messages are not parsed correctly by >the re >> that you are using. They show up as From: (none) Subj: (none) in the UI >> >> We've recently seen some problems with malformed mails. We're examining >this >> issue (see the email package use thread) >> >> - but I haven't determined why just yet (though I can see that some part >of >> the message is getting stuck with the header by your re.split(r'\n\r?\n', >> messageText, 1) expression. >> > >> >- Z >> >_______________________________________________ >> >Spambayes mailing list >> >Spambayes@python.org >> >http://mail.python.org/mailman/listinfo/spambayes >> > >> > >> >> >> c'est moi - TimS >> http://www.fourstonesExpressions.com >> http://wecanstopspam.org >> >> >> > > > c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org From T.A.Meyer at massey.ac.nz Thu Feb 20 11:50:19 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Wed Feb 19 17:52:32 2003 Subject: [Spambayes] training Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1318CD63@its-xchg4.massey.ac.nz> (The problem with being on the nicer side of the world is that most of the mail arrives when you are asleep. Apologies for the length of the reply). [TimS] > The problem here is that some mailers pretty much lose most > of the headers when you do a forward operation... Which would be when they would have to place the id in the body - mailers might do strange things to the body when forwarding, but surely none of them actually remove content. It would be nice if the smtpproxy could automagically change to including the id in the body if you started forwarding it messages without ids. (Well, not nice for those of us who like control, but for the 'average-user'. [TimS] > Placing > something like a url in the body of > a message is another possibility that's been raised. It's > somewhat dangerous, particularly in the case of multipart > messages, and for html messages may not > be visible at all. I think the spoofing possibilties of a URL (as back in the November posts) removes it as a possibility, as nice as it would be. A non-clickable message id, though, shouldn't be spoofable (a spammer shouldn't be able to generate valid ids). [Neale] > Also, consider the company that gets > gigabytes of email every day. How long do they keep a > message in their pool for future training? This is a problem with the existing pop3proxy as well, of course. Isn't spambayes still aimed at individuals, not organisations? [Neale] > And anyway, forwarding a message to a special address is > still too much work. If this is the agreed conclusion, then I don't really see any options other than: (a) Don't get the user to train (they would have to start with some sort of pretrained database). This does really kill all the power of spambayes, even if they could update their pretrained databases (that someone else trains for them). OR (b) Integration into lots of clients, al la the Outlook plugin. [TimS] > Absolutely. As things are right now, it's not useable by anyone but > people like us, which as dismaying as that may be, is not the norm. Well, I would say that the Outlook plugin *is* usable by anyone, except that you have to install Python first, and removing the plugin is not simple. Well, I guess some sort of automated training would be good too (Mark is working on this, I believe). Anyway, since I've got the time, I'll go ahead and make the patches to get the smtpproxy to work, and then we can evaluate it. If it gets thrown away, oh well never mind :) It could at least make things easier for those that are currently using it, while we all build integrations into everyone else's favourite mail client. =Tony Meyer From T.A.Meyer at massey.ac.nz Thu Feb 20 11:58:13 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Wed Feb 19 18:02:01 2003 Subject: [Spambayes] training WAS: aging information Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1318CD64@its-xchg4.massey.ac.nz> [Alex] > I was the one who did the bulk of the ratio experiments, and I > posted my results at http://www.wolfskeep.com/~popiel/spambayes. > > It would be worthwhile to rerun similar experiments with current > versions of the code, too. Thanks for (re)posting this link, certainly interesting reading. Were these done before or after the experimental_ham_spam_in_balance code? What I would like to know (and I suspect others) is whether this means that say I have in my stored mail a ham:spam ratio of 300:3000. Should I randomly chose 300 ham and have a 300:300 ratio? Or is giving up the information in the other 2700 messages a bad thing? If someone was willing to do some more tests with the most recent code, I think lots of people would be interested. =Tony Meyer From tim at fourstonesExpressions.com Wed Feb 19 17:02:05 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Wed Feb 19 18:02:16 2003 Subject: [Spambayes] training In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F1318CD63@its-xchg4.massey.ac.nz> Message-ID: 2/19/2003 4:50:19 PM, "Meyer, Tony" wrote: >(The problem with being on the nicer side of the world is that most of the mail arrives when you are asleep. Apologies for the length of the reply). > >[TimS] >> The problem here is that some mailers pretty much lose most >> of the headers when you do a forward operation... >Which would be when they would have to place the id in the body - mailers might do strange things to the body when forwarding, but surely none of them actually remove content. > >It would be nice if the smtpproxy could automagically change to including the id in the body if you started forwarding it messages without ids. (Well, not nice for those of us who like control, but for the 'average-user'. We really should make every effort to make the non-OL2K side of spambayes mailer agnostic. There's just too many of 'em... a testing matrix might be interesting, but I don't believe we should *ever* put mailer specific code in the core stuff. Specific mailer plugins would be cool, but most mailers have no such plugin architecture. > >[TimS] >> Placing >> something like a url in the body of >> a message is another possibility that's been raised. It's >> somewhat dangerous, particularly in the case of multipart >> messages, and for html messages may not >> be visible at all. >I think the spoofing possibilties of a URL (as back in the November posts) removes it as a possibility, as nice as it would be. A non-clickable message id, though, shouldn't be spoofable (a spammer shouldn't be able to generate valid ids). Agreed. > >[Neale] >> Also, consider the company that gets >> gigabytes of email every day. How long do they keep a >> message in their pool for future training? >This is a problem with the existing pop3proxy as well, of course. Isn't spambayes still aimed at individuals, not organisations? > >[Neale] >> And anyway, forwarding a message to a special address is >> still too much work. >If this is the agreed conclusion, then I don't really see any options other than: >(a) Don't get the user to train (they would have to start with some sort of pretrained database). This does really kill all the power of spambayes, even if they could update their pretrained databases (that someone else trains for them). >OR >(b) Integration into lots of clients, al la the Outlook plugin. I'm a bit stumped here, too... still thinkin hard, maybe some kind of fuzzy matching? Let's get creative, think outside the box, yadda yadda... - TimS > >[TimS] >> Absolutely. As things are right now, it's not useable by anyone but >> people like us, which as dismaying as that may be, is not the norm. >Well, I would say that the Outlook plugin *is* usable by anyone, except that you have to install Python first, and removing the plugin is not simple. Well, I guess some sort of automated training would be good too (Mark is working on this, I believe). > >Anyway, since I've got the time, I'll go ahead and make the patches to get the smtpproxy to work, and then we can evaluate it. If it gets thrown away, oh well never mind :) It could at least make things easier for those that are currently using it, while we all build integrations into everyone else's favourite mail client. Tony, don't spend a whole lot of time making the smtpproxy work in a production manner. It'll be a good research tool, but it can't share database with the pop3proxy, and so training will be moot. It will need to be integrated with the pop3proxy, a non-trivial task as pop3proxy uses asyncore module, Dibbler, and a bunch of other stuff that Richie might be the only person on the planet that understands right now... I'm workin on getting my head around it, but I'm not there yet. - TimS > >=Tony Meyer > >_______________________________________________ >Spambayes mailing list >Spambayes@python.org >http://mail.python.org/mailman/listinfo/spambayes > > c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org From T.A.Meyer at massey.ac.nz Thu Feb 20 12:09:25 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Wed Feb 19 18:11:42 2003 Subject: [Spambayes] training Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1318D4F6@its-xchg4.massey.ac.nz> > We really should make every effort to make the non-OL2K side > of spambayes mailer agnostic. Agreed. > I'm a bit stumped here, too... still thinkin hard, maybe some > kind of fuzzy matching? Let's get creative, think > outside the box, yadda yadda... - What about a separate (from the mailer) application that floats in a window (menubar, ...) somewhere. When you come across an incorrectly trained message, you click the "incorrectly trained" button in the window, and it figures out what message you are currently reading* and retrains it. * Ok, so this step is pretty hard. It could always take a screenshot, run it through OCR, and work with that ;) > Tony, don't spend a whole lot of time making the smtpproxy work in a > production manner. I don't have *that* much time :) > It'll be a good research tool, but it can't share database > with the pop3proxy, and so training will be moot. Mark's database abstraction would help here, yes? =Tony Meyer From edwardam at interlix.com Wed Feb 19 17:12:05 2003 From: edwardam at interlix.com (Edward Muller) Date: Wed Feb 19 18:12:40 2003 Subject: [Spambayes] Training spambayes on a large set of mailbox files Message-ID: <1045696325.2809.1.camel@localhost.localdomain> Is there an easy way to train spambayes on a large set of mailbox files? It would be a read pain in the but to have to use multiple options on the command line for each mailbox file.... Just looking for a shortcut. -- Edward Muller Interlix - President Web Hosting - PC Service & Support Custom Programming - Network Service & Support Phone: 417-862-0573 Cell: 417-844-2435 Fax: 417-862-0572 http://www.interlix.com From tim.one at comcast.net Wed Feb 19 18:12:33 2003 From: tim.one at comcast.net (Tim Peters) Date: Wed Feb 19 18:13:02 2003 Subject: [Spambayes] training In-Reply-To: Message-ID: [Tim Stone] > ... > We really should make every effort to make the non-OL2K side of > spambayes mailer agnostic. There's just too many of 'em... a testing > matrix might be interesting, but I don't believe we should *ever* put > mailer specific code in the core stuff. Specific mailer plugins would > be cool, but most mailers have no such plugin architecture. Watching Sean True wrestle with the Outlook plugin at the start, doing my bit to help with that then, and watching Mark Hammond wrestle with it ever after, I'm halfway toward concluding that no mail client has a usable plugin architecture. The good news is that they will, just as soon as they're all rewritten in Python . mailer-specific-code-will-consume-your-life-ly y'rs - tim From popiel at wolfskeep.com Wed Feb 19 15:28:30 2003 From: popiel at wolfskeep.com (T. Alexander Popiel) Date: Wed Feb 19 18:28:33 2003 Subject: [Spambayes] training WAS: aging information In-Reply-To: Message from "Meyer, Tony" <1ED4ECF91CDED24C8D012BCF2B034F1318CD64@its-xchg4.massey.ac.nz> References: <1ED4ECF91CDED24C8D012BCF2B034F1318CD64@its-xchg4.massey.ac.nz> Message-ID: <20030219232830.1D0C92DEAD@cashew.wolfskeep.com> In message: <1ED4ECF91CDED24C8D012BCF2B034F1318CD64@its-xchg4.massey.ac.nz> "Meyer, Tony" writes: >[Alex] >> I was the one who did the bulk of the ratio experiments, and I >> posted my results at http://www.wolfskeep.com/~popiel/spambayes. >> >> It would be worthwhile to rerun similar experiments with current >> versions of the code, too. > >Thanks for (re)posting this link, certainly interesting reading. Were >these done before or after the experimental_ham_spam_in_balance code? Before; I like to think that my results were in part responsible for getting that option added. >What I would like to know (and I suspect others) is whether this means >that say I have in my stored mail a ham:spam ratio of 300:3000. Should >I randomly chose 300 ham and have a 300:300 ratio? Or is giving up the >information in the other 2700 messages a bad thing? Well, as long as the 300 ham chosen are actually representative of the types of ham you get, I don't see any harm in only using 300. I don't have the math or the experimental results to back that up, though. >If someone was willing to do some more tests with the most recent code, >I think lots of people would be interested. I'm trying to, but life keeps interfering. - Alex From tim at fourstonesExpressions.com Wed Feb 19 18:08:16 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Wed Feb 19 19:08:43 2003 Subject: [Spambayes] Other pop3proxy options In-Reply-To: <055701c2d86c$b8e33200$a100a8c0@zlichstein> Message-ID: <96FCFBVRA02UJDBVQ21XUONJHMI84YX.3e541c70@myst> We're getting very close to requiring a full blown mail client, not just a proxy. Perhaps the solution to many problems is to provide a spambayes mailer, with training, retraining, etc. etc. all built in, and provide the proxy with necessarily limited function for those who don't wish to use the spambayes mailer. This thought is daunting, but it really is the easiest from a useability standpoint. - TimS 2/19/2003 5:14:42 PM, "Zander" wrote: >I also think that a rework to allow for "per-account" management/training >would be much cooler. I could then provide spam marking for my various >accounts separately (and/or for my friends throuhg my server. > >Obviously, another would be to allow for some level of security/login on the >HTTP server (maybe with a new account/login/password created for each POP3 >account that's used against the pop3proxy dynamically). > >I'd be happy to help out in any of this - I'm a devout Pythonista, but don't >actually have that much experience ;-) > >- Z >----- Original Message ----- >From: "Tim Stone - Four Stones Expressions" >To: ; "Zander" ; "Spambayes" >Sent: Wednesday, February 19, 2003 2:26 PM >Subject: Re: [Spambayes] Other pop3proxy options > > >> Great set of requirements! I'll make it so asap. Watch the checkin >list... - >> TimS >> >> 2/19/2003 4:06:02 PM, "Zander" wrote: >> >> >I think this will be necessary to change - because I can't seem to >construct >> >a filter in OE6 that will use your current To: ; ... munging >> >strategy to classify my spam properly. [Not only that, but a "Reply To >All" >> >gets awkward...] >> > >> >Your notes say something about "contains 'spam' followed by a comma" - >but >> >that doesn't work in OE. I tried contains "spam;" [followed by a >> >semicolon - which is the address separator], but that too doesn't work. >If >> >I use "contains 'spam'" - obviously that's wrong because it pulls, for >> >example, the spambayes mailinglist posts ;-) >> > >> >So... I think that the >> > >> > a. To: munging doesn't work as written now in OE6 AFAICT. >> > b. There should be the option of To:, Subject; [or CC:, or whatever] >> >munging (why not make it a variable) >> > c. The replace token should be configurable: ie: , >[***SPAM***], >> >or [!@#Mycust0m Flag*&*&%]. >> > i. The reason for including the above is that *I* would >> >prefer to leave 'ham' untouched, while marking SPAM obviously, and >probably >> >marking unsure subtly. >> > >> >- Z >> >----- Original Message ----- >> >From: "Tim Stone - Four Stones Expressions" > >> >To: ; ; "Zander" >> >Sent: Wednesday, February 19, 2003 12:36 PM >> >Subject: Re: [Spambayes] Other pop3proxy options >> > >> > >> >> 2/19/2003 12:46:07 PM, "Zander" wrote: >> >> >> >> >I would like to extend the options for how disposition is identified >by >> >the >> >> pop3proxy implementation. In particular, I would like the option of >> >> > >> >> >A. X-Spambayes-Classification: as now >> >> >B. To: XXXXX as is in CVS now >> >> >C. Subject line munging to append >> >> > >> >> >Is there any reason that was not included? (beside the obvious >potential >> >for >> >> a spammer to slip in a workaround) I use Outlook Express, and >obviously >> >can't >> >> use the arbitrary header technique - and am most interested in adding a >> >> [***SPAM***] header so that I can correctly bucketize those messages - >but >> >> leave [***UNSURE***] in my primary box, and not molest ham messages at >> >all. >> >> > >> >> >Is there any reason not to do this? Would you accept it if I did? Is >> >there >> >> any reason why you aren't using the email module Parser API to crack >the >> >> headers? >> >> >> >> Subject munging will be simple to add, and I can do it. Stay tuned. >> >> >> >> I have found a certain number of messages are not parsed correctly by >> >the re >> >> that you are using. They show up as From: (none) Subj: (none) in the >UI >> >> >> >> We've recently seen some problems with malformed mails. We're >examining >> >this >> >> issue (see the email package use thread) >> >> >> >> - but I haven't determined why just yet (though I can see that some >part >> >of >> >> the message is getting stuck with the header by your >re.split(r'\n\r?\n', >> >> messageText, 1) expression. >> >> > >> >> >- Z >> >> >_______________________________________________ >> >> >Spambayes mailing list >> >> >Spambayes@python.org >> >> >http://mail.python.org/mailman/listinfo/spambayes >> >> > >> >> > >> >> >> >> >> >> c'est moi - TimS >> >> http://www.fourstonesExpressions.com >> >> http://wecanstopspam.org >> >> >> >> >> >> >> > >> > >> > >> >> >> c'est moi - TimS >> http://www.fourstonesExpressions.com >> http://wecanstopspam.org >> >> >> > > > c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org From neale at woozle.org Wed Feb 19 16:56:42 2003 From: neale at woozle.org (Neale Pickett) Date: Wed Feb 19 19:57:11 2003 Subject: [Spambayes] Training spambayes on a large set of mailbox files In-Reply-To: <1045696325.2809.1.camel@localhost.localdomain> (Edward Muller's message of "19 Feb 2003 17:12:05 -0600") References: <1045696325.2809.1.camel@localhost.localdomain> Message-ID: Edward Muller writes: > Is there an easy way to train spambayes on a large set of mailbox files? > > It would be a read pain in the but to have to use multiple options on > the command line for each mailbox file.... Sure. Do it from python: >>> import mboxtrain >>> h = hammie.open('/path/to/your/database', True, 'c') >>> for m in ('hambox1', 'hambox2', 'hambox3'): ... mboxtrain.train(h, m, False, False) ... (I haven't tested this, so back up your mailboxes before you do it.) If you're in unix, just do this: $ for i in hambox1 hambox2 hambox3; do > mboxtrain.py -g $i > done Although maybe this should be yet another option to mboxtrain. Neale From neale at woozle.org Wed Feb 19 16:58:45 2003 From: neale at woozle.org (Neale Pickett) Date: Wed Feb 19 19:59:15 2003 Subject: [Spambayes] Other pop3proxy options In-Reply-To: <96FCFBVRA02UJDBVQ21XUONJHMI84YX.3e541c70@myst> (Tim Stone - Four Stones Expressions's message of "Wed, 19 Feb 2003 18:08:16 -0600") References: <96FCFBVRA02UJDBVQ21XUONJHMI84YX.3e541c70@myst> Message-ID: Tim Stone - Four Stones Expressions writes: > Perhaps the solution to many problems is to provide a spambayes > mailer, with training, retraining, etc. etc. all built in, and provide > the proxy with necessarily limited function for those who don't wish > to use the spambayes mailer. I do *not* want to have to start writing an MUA. And I doubt anyone else on this list does, either. Writing an MUA is *hard*. Just look at Microsoft, they're a huge company with nearly unlimited resources and even *they* can't get it right ;) Neale From tim.one at comcast.net Wed Feb 19 20:01:29 2003 From: tim.one at comcast.net (Tim Peters) Date: Wed Feb 19 20:02:00 2003 Subject: [Spambayes] training WAS: aging information In-Reply-To: <3E5281E6.24287.3963E01E@localhost> Message-ID: [D. R. Evans] > ... > I saw a comment in the LJ article that one should train on roughly > equal numbers of spam and ham. Is this actually true? (This question of > course merely demonstrates that I'm too lazy to do the maths myself.) I think so, but insufficient testing has been done to prove it with high confidence. A thought experiment may help: suppose you don't know any French or Russian, but get a job requiring you to identify which is which from transcripts of conversations. Say you've been trained on 100 French transcripts and 1 Russian transcript. 90 of 100 French transcripts contained the phrase "bon mot". The single Russian transcript you saw did not. First day on the job, the first transcript you see does contain "bon mot". Is it French or Russian? A fact that's hard to account for is that you know much less about Russian than about French at this point. By default, spambayes gives full credit (a very high francoprob) to "bon mot" based on what you do know about French, and doesn't penalize it (lower the francoprob) to account for that you know so much less about Russian. As a result, the transcript will almost certainly be judged French. But spam contains ham words routinely, and vice versa, and, indeed, a number of French phrases have become part of the international vocabulary -- there's just no better way to say mot juste . With experimental_ham_spam_imbalance_adjustment: True spambayes takes the French evidence and discounts it, to give words francoprobs *as if* you had seen no more French transcripts than Russian ones. In the example, "bon mot" will get a mild francoprob instead of a very strong one, because the system can't claim to be sure of anything based on one training example of each. There are downsides to both in practice. Mark mentioned that he tends to keep training spam, and that's a predictable outcome of setting this option to True once spam outnumbers ham: additional training on spam doesn't do a heck of a lot to boost spamprobs then, because almost as much as non-adjusted training boosts them, the imbalance adjustment knocks them down again. (So, Mark, if you're listening, try training on a pile of ham instead next time: that will, perhaps paradoxically, raise the spamprobs on spam words.) OTOH, if this adjustment isn't made, the corpus (ham or spam) with the higher training count gets words with probabilities closer to its endpoint (0.0 or 1.0) than the other corpus *can* get, and that can give the accidental appearance of the strong flavor of word in the weak flavor of msg more power than the weak words can overcome. In an uncharitable mood, you can think of it as getting screwed either way -- but if you've told any system a lot more about one kind of msg than the other, relatively speaking it *has* to "guess" a lot more about the kind of msg you've withheld. Remember that it can't infer patterns or meanings either -- it's just staring at isolated words. From tim.one at comcast.net Wed Feb 19 20:19:43 2003 From: tim.one at comcast.net (Tim Peters) Date: Wed Feb 19 20:20:13 2003 Subject: [Spambayes] training WAS: aging information In-Reply-To: <20030219232830.1D0C92DEAD@cashew.wolfskeep.com> Message-ID: [Meyer, Tony] > Thanks for (re)posting this link, certainly interesting reading. Were > these done before or after the experimental_ham_spam_in_balance code? [T. Alexander Popiel] > Before; I like to think that my results were in part responsible for > getting that option added. They certainly were. At that time, tests on my 35,000 msg corpora were already too good to show any improvement (by any means), so all I could say for sure is that adding the option didn't hurt my main test's results. Some brief experiments on lopsided subsets suggested it would help. Sjoerd reported stronger positive results on his real-life test data. Someone later (Anthony?) reported negative results, but staring at the data I didn't immediately agree they were significant results, and ran out of time to argue the issue. So it remained an option. > ... > Well, as long as the 300 ham chosen are actually representative of > the types of ham you get, I don't see any harm in only using 300. > I don't have the math or the experimental results to back that up, > though. My home email classifer is still trained on fewer than 1,000 msgs total, about 40/60 ham/spam. Since I get about 600 emails per day, this is less than two days' traffic. I get a few (2 to 10) Unsures each day, but they're generally so unusual I don't bother to train on them. At least half the time, I'm not sure whether they're ham or spam either and just delete them with a shrug. Cool: last week I got signed up on some commercial spam mailing list, along with hundreds of others, and of course this triggered a near-endless cascade of newbies posting outraged msgs to the list demanding to be taken off, then other newbies demanding to know why the first batch was accusing them of sending spam, etc etc etc. I had to train on 3 of those before they reliably moved from Unsure to Spam (the header clues were great; the msg bodies were hopeless), and was spared perhaps 600 more of these things. Note that this stuff wasn't really spam by most meanings of the word: it was sent by real people, and was not automated. I still love that spambayes believes whatever I tell it to believe! From anthony at interlink.com.au Thu Feb 20 12:23:04 2003 From: anthony at interlink.com.au (Anthony Baxter) Date: Wed Feb 19 20:26:47 2003 Subject: [Spambayes] Use of email package In-Reply-To: <15955.48207.421755.891103@gargle.gargle.HOWL> Message-ID: <200302200123.h1K1N4Y12363@bonanza.off.ekorp.com> >>> Barry A. Warsaw wrote > Thus, it was designed with an eye toward the use of application > specific parsers, and it may well be that the default parsers (both > the strict and the lax parsers) may not be appropriate for an > application that tends to see intentionally ill-formed messages. My > suggestion would be to write a parser that can handle the really bad > messages, then use the default lax parser for most things, and fall > back to the "adaptive parser" for the really horrendous messages. Note also, though, that the email package's parser has two modes: strict adherence to to specs, and a more slack mode (which I'm largely responsible for). There are very few cases I know of that break the current non-strict mode -- the only one that comes to mind is MS Entourage generating nested multiparts with each part using the same boundary tag. Rather than saying "let's write a new parser" I'd say instead try to figure out why a particular message is failing, and see what can be done to fix the existing non-strict mode Parser. -- Anthony Baxter It's never too late to have a happy childhood. From edwardam at interlix.com Wed Feb 19 19:43:00 2003 From: edwardam at interlix.com (Edward Muller) Date: Wed Feb 19 20:43:33 2003 Subject: [Spambayes] Training spambayes on a large set of mailbox files In-Reply-To: References: <1045696325.2809.1.camel@localhost.localdomain> Message-ID: <1045705380.3098.37.camel@localhost.localdomain> Comments inline ... BTW: on another topic ... Does spambayes support any SQL servers? If not I'm tempted to look at adding support.... On Wed, 2003-02-19 at 18:56, Neale Pickett wrote: > Edward Muller writes: > > > Is there an easy way to train spambayes on a large set of mailbox files? > > > > It would be a read pain in the but to have to use multiple options on > > the command line for each mailbox file.... > > Sure. Do it from python: > > >>> import mboxtrain > >>> h = hammie.open('/path/to/your/database', True, 'c') > >>> for m in ('hambox1', 'hambox2', 'hambox3'): > ... mboxtrain.train(h, m, False, False) > ... > > (I haven't tested this, so back up your mailboxes before you do it.) hehe. I'll probably do something like this ... > > If you're in unix, just do this: > > $ for i in hambox1 hambox2 hambox3; do > > mboxtrain.py -g $i > > done That would work if they were all in the same directory. I have a fairly deep directory structure of mailboxes (IMAP server)... I'll probably just construct a loop and a function or two in python to do it... > > Although maybe this should be yet another option to mboxtrain. > > Neale -- Edward Muller Interlix - President Web Hosting - PC Service & Support Custom Programming - Network Service & Support Phone: 417-862-0573 Cell: 417-844-2435 Fax: 417-862-0572 http://www.interlix.com From tim at fourstonesExpressions.com Wed Feb 19 22:05:37 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Wed Feb 19 23:05:49 2003 Subject: [Spambayes] Other pop3proxy options In-Reply-To: Message-ID: 2/19/2003 6:58:45 PM, Neale Pickett wrote: >Tim Stone - Four Stones Expressions writes: > >> Perhaps the solution to many problems is to provide a spambayes >> mailer, with training, retraining, etc. etc. all built in, and provide >> the proxy with necessarily limited function for those who don't wish >> to use the spambayes mailer. > >I do *not* want to have to start writing an MUA. And I doubt anyone >else on this list does, either. Writing an MUA is *hard*. Just look at >Microsoft, they're a huge company with nearly unlimited resources and >even *they* can't get it right ;) I'm totally in agreement with that. I do *not* want to write an MUA either. So we're gonna have to settle for somewhere inbetween drop-dead easy to use and what is possible... - TimS > >Neale > > > c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org From tim at fourstonesExpressions.com Wed Feb 19 22:08:49 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Wed Feb 19 23:09:00 2003 Subject: [Spambayes] Use of email package In-Reply-To: <200302200123.h1K1N4Y12363@bonanza.off.ekorp.com> Message-ID: 2/19/2003 7:23:04 PM, Anthony Baxter wrote: > >>>> Barry A. Warsaw wrote >> Thus, it was designed with an eye toward the use of application >> specific parsers, and it may well be that the default parsers (both >> the strict and the lax parsers) may not be appropriate for an >> application that tends to see intentionally ill-formed messages. My >> suggestion would be to write a parser that can handle the really bad >> messages, then use the default lax parser for most things, and fall >> back to the "adaptive parser" for the really horrendous messages. > > >Note also, though, that the email package's parser has two modes: >strict adherence to to specs, and a more slack mode (which I'm >largely responsible for). I guess we now know to whom to assign the email parser bugs we identify > There are very few cases I know of that >break the current non-strict mode -- the only one that comes to >mind is MS Entourage generating nested multiparts with each part >using the same boundary tag. How do you specify strict vs non-strict mode? > >Rather than saying "let's write a new parser" I'd say instead try >to figure out why a particular message is failing, and see what >can be done to fix the existing non-strict mode Parser. Fair enough, I suppose. > > >-- >Anthony Baxter >It's never too late to have a happy childhood. > > >_______________________________________________ >Spambayes mailing list >Spambayes@python.org >http://mail.python.org/mailman/listinfo/spambayes > > c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org From tim at fourstonesExpressions.com Wed Feb 19 22:28:04 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Wed Feb 19 23:28:13 2003 Subject: [Spambayes] Other pop3proxy options In-Reply-To: Message-ID: 2/19/2003 6:58:45 PM, Neale Pickett wrote: >Tim Stone - Four Stones Expressions writes: > >> Perhaps the solution to many problems is to provide a spambayes >> mailer, with training, retraining, etc. etc. all built in, and provide >> the proxy with necessarily limited function for those who don't wish >> to use the spambayes mailer. > >I do *not* want to have to start writing an MUA. And I doubt anyone >else on this list does, either. Writing an MUA is *hard*. Just look at >Microsoft, they're a huge company with nearly unlimited resources and >even *they* can't get it right ;) Ok, one more comment here... it is precisely *because* they have nearly unlimited resources that they can't get it right. We would have a *much* better chance of getting it right, simply because we couldn't afford to not get it right. Not that I want to do it, mind you, but u$0phhhht shouldn't be the reason NOT to... > >Neale > > > c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org From Paul.Moore at atosorigin.com Thu Feb 20 09:29:22 2003 From: Paul.Moore at atosorigin.com (Moore, Paul) Date: Thu Feb 20 04:29:58 2003 Subject: [Spambayes] training Message-ID: <16E1010E4581B049ABC51D4975CEDB880113D925@UKDCX001.uk.int.atosorigin.com> From: Tim Stone - Four Stones Expressions > I'm a bit stumped here, too... still thinkin hard, > maybe some kind of fuzzy matching? Let's > get creative, think outside the box, yadda yadda... - I'm convinced this is a bad idea, but I'll throw it in anyway, just in case it sparks better ones: Munge the reply-to address on spam to dump it into the training bucket. Then, the user just replies to the message to train on it. Reasons it won't work: 1. You don't want to encourage people to hit "Reply" on spam. 2. You can't munge ham like this, and you've just destroyed useful info in a FP. 3. You still need a way to get the original message (you could do this by using an address of spam+@localhost and trapping all addresses of this form). As I say, maybe there are some useful points somewhere in this bad idea... Paul. PS Tim, your mailer quotes long source lines *really* badly, making your replies very hard to read sometimes. Is this something you can change? From Paul.Moore at atosorigin.com Thu Feb 20 09:42:23 2003 From: Paul.Moore at atosorigin.com (Moore, Paul) Date: Thu Feb 20 04:42:55 2003 Subject: [Spambayes] training WAS: aging information Message-ID: <16E1010E4581B049ABC51D4975CEDB886199EE@UKDCX001.uk.int.atosorigin.com> From: Tim Peters [mailto:tim.one@comcast.net] > I get a few (2 to 10) Unsures each day, but they're > generally so unusual I don't bother to train on them. This made me think for a moment "how can I *not* train on unsures in Outlook?" In retrospect, it's easy - just read them (or not) in place and then delete or file (in a folder not marked as "certain ham"). But the Outlook plugin has ingrained me with the idea that you see spam anywhere other than the spam folder, and you hit the "delete as spam" button. The subliminal message will probably hit non-expert users, too, so maybe it would be useful in rewording the text on the button to emphasise the "train" aspect rather than the "get rid of" aspect. Otherwise, I can imagine the average user training on all unsures and FNs ad infinitum. This will both increase the impression that maintenance is a chore, and increase the spam:ham imbalance in the training database over time. Frankly, it never occurred to me to train for a bit and then just *stop*. Paul. From klassa at nc.rr.com Thu Feb 20 12:34:48 2003 From: klassa at nc.rr.com (klassa@nc.rr.com) Date: Thu Feb 20 12:33:52 2003 Subject: [Spambayes] getting SpamBayes to work with Outlook XP Message-ID: <14677.1045762488@qwop.com> In case anybody else is running into this, I got SpamBayes to work with Outlook XP, but it took a couple of tries. I first tried ActiveState's version of python, thinking I'd be all set because it comes with the win32all stuff (at least, I think it does). It's also got the email extensions (again, at least I think it does) because it's 2.2 (and the email extensions come with 2.2+, according to Barry's web page about the email extensions). I ran the addin.py, and it told me that it had registered SpamBayes.OutlookAddin (or something to that effect), but I could never get it to come up when I subsequently brought up Outlook. I uninstalled the ActiveState port, intstalled the sf version of python, installed the win32all distribution, and then installed the email extensions (just in case). Ran addin.py, brought up Outlook, joy. Just another data point. FWIW, YMMV, and all that... John From tim.one at comcast.net Thu Feb 20 14:45:17 2003 From: tim.one at comcast.net (Tim Peters) Date: Thu Feb 20 14:45:52 2003 Subject: [Spambayes] Outlook Express configuration In-Reply-To: Message-ID: [Tim Stone] > Does anyone know if there's an api, or some way to > programmatically configure OE? AFAIK, OE has no programmatic interfaces or hooks, none, nada, zilch. > I'm writing an installer for the pop3proxy, which needs to simply set the > pop3 server address and port. I can't even find a place in the registry > where this stuff is stored! Suggestions? Many, but none likely to be helpful . From tim at fourstonesExpressions.com Thu Feb 20 15:06:20 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Thu Feb 20 16:06:34 2003 Subject: [Spambayes] Outlook Express configuration In-Reply-To: Message-ID: 2/20/2003 1:45:17 PM, Tim Peters wrote: >[Tim Stone] >> Does anyone know if there's an api, or some way to >> programmatically configure OE? > >AFAIK, OE has no programmatic interfaces or hooks, none, nada, zilch. I concur. Unbelievable. At least the other mailers store stuff somewhere where you can parse it out and figure out how to configure it... that's why we call it wind'ohs. > >> I'm writing an installer for the pop3proxy, which needs to simply set the >> pop3 server address and port. I can't even find a place in the registry >> where this stuff is stored! Suggestions? > >Many, but none likely to be helpful . How about a sampling? <2/3 wink> - TimS > >_______________________________________________ >Spambayes mailing list >Spambayes@python.org >http://mail.python.org/mailman/listinfo/spambayes > > c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org From lists at morpheus.demon.co.uk Thu Feb 20 21:20:38 2003 From: lists at morpheus.demon.co.uk (Paul Moore) Date: Thu Feb 20 16:22:45 2003 Subject: [Spambayes] Any prospect of spambayes working with qmail? References: <16E1010E4581B049ABC51D4975CEDB886199E5@UKDCX001.uk.int.atosorigin.com> <20030214142147.07CF42DE8C@cashew.wolfskeep.com> Message-ID: "T. Alexander Popiel" writes: > Yes. It'd also be a great source for rules for my testing harness. > If you make the doc, I may be able to provide graphs of accuracy to > go with it... OK, here's what I came up with. I rethought a bit, based on the fact that I started to consider "if the system is accurate enough, why train at all?" So I've probably stressed the fact that you don't need to train after a certain point more than I would have a day or two ago... Also, I don't have much experience with automating training, so I may have missed some possibilities there. But here's what I have, for what it's worth. ---------------------------------------------------------------------- Training Methods for Spambayes ============================== General training issues ----------------------- In order to get good results from the spambayes system, it is necessary to train it. As the system is trained, it gains an "understanding" of what you, personally, consider to be ham and spam, and bases its decisions on this understanding. It is *not* necessary to continue training indefinitely. Once the system is giving reliable results, it is perfectly acceptable to stop training, except to correct the system's mistakes, or to train the system on new categories of spam. (Or ham - if you subscribe to a new type of newsletter, the system may initially guess incorrectly, if the newsletter has similar characteristics to mail you previously trained on as spam. Training on the first few newsletters should correct the system pretty quickly). While there are a number of training techniques discussed below, it should be noted that no training method has been shown to significantly degrade the performance of the system - results are generally excellent with even the most minimal training. Initial training ---------------- Before the system can start classifying mails, it needs some training. When the system is installed, there are basically two possibilities for the initial training: 1. Do nothing. The system will initially classify everything as "unsure". 2. Train on sample collections of ham and spam. In this case, careful selection of the initial training set is important. The system can easily pick up on unintended clues. For example, if you train on a batch of recent spam, and on the contents of your inbox, the system could decide that the best spam clue is the message date - new mails are spam! Ongoing training ---------------- Once the system is running, there are a number of possible approaches to training. These approaches vary in the level of manual intervention required, and potentially in the accuracy of the results (although, as mentioned above, no training method seems to produce particularly bad results). 1. Train on everything. Check and train on every message received, regardless of whether the system classified it correctly or not. While this is a very manual chore, it is eased by the fact that the system does classify mail. However, it does still require manual scanning of the spam folder. 2. Train automatically on what the system classifies as spam or ham, and manually on unsures. This approach tends to reinforce any mistakes the system makes. Retraining of false negatives (spam incorrectly classified as ham) and false positives (ham incorrectly classified as spam) helps, but converts this method into a variation on the "train everything" approach. 3. Train on mistakes and unsures only. Anything correctly classified can be left alone. 4. Train on mistakes only. If the level of unsures is low enough, it may not be worth training on them - particularly if it is difficult to decide how to classify them even by hand. 5. Don't train. This assumes that the system's decisions have reached an acceptable level of accuracy. In general, as the system stabilises, any training approach (other than automated approaches such as (2) above) is likely to tend towards the "don't train" option. ---------------------------------------------------------------------- Paul. -- This signature intentionally left blank From tshumway at jdiworks.net Thu Feb 20 13:52:42 2003 From: tshumway at jdiworks.net (Terrel Shumway) Date: Thu Feb 20 16:49:32 2003 Subject: [Spambayes] Other pop3proxy options In-Reply-To: <96FCFBVRA02UJDBVQ21XUONJHMI84YX.3e541c70@myst> References: <96FCFBVRA02UJDBVQ21XUONJHMI84YX.3e541c70@myst> Message-ID: <200302201352.42579.tshumway@jdiworks.net> On Wednesday 19 February 2003 16:08, Tim Stone - Four Stones Expressions wrote: > We're getting very close to requiring a full blown mail client, not just a > proxy. Perhaps the solution to many problems is to provide a spambayes > mailer, with training, retraining, etc. etc. all built in, and provide the > proxy with necessarily limited function for those who don't wish to use the > spambayes mailer. This thought is daunting, but it really is the easiest > from a useability standpoint. - TimS I have not read this thread very closely yet, but I thought I would chime in. I am working on a complete mail server suite written in python. The pop3 server is working with a maildir backing store. The pop3 client "works" for now, using poplib.py, but I will soon have an async version that supports pipelining etc. The smtp currently listener works for basic delivery, but some fancy configuration stuff still needs to be worked out. It is all designed to be runnable as a single process with async listeners and http-based configuration -- click and run. Now I ask: could this help solve the problems you are discussing? It could be used to make a smarter proxy that uses an intermediate store rather than an online connection to an upstream server. (Now I will go back and catch up on the discussion) -- Terrel From mhammond at skippinet.com.au Fri Feb 21 09:10:57 2003 From: mhammond at skippinet.com.au (Mark Hammond) Date: Thu Feb 20 17:11:31 2003 Subject: [Spambayes] Outlook Express configuration In-Reply-To: Message-ID: <000601c2d92c$f239d520$530f8490@eden> > >AFAIK, OE has no programmatic interfaces or hooks, none, nada, zilch. > > I concur. Unbelievable. Not really. MS give away Outlook Express. Their commercial offering (they are a company, after all) is Outlook. It is very basic business sense to differentiate these products. > At least the other mailers store > stuff somewhere > where you can parse it out and figure out how to configure > it... Obviously Outlook Express stores it *somewhere* (IIRC, .elm files in "Application Data") and obviously a number of other products have worked out how to parse the info. Writing new info will be harder, hence we see almost no 3rd party products with this ability. Windows is Windows, and you wont find me apologizing for the areas in which it sucks. On the other hand, you wont find me "apologizing" for the things it does well. I am pretty confounded as to how this SpamBayes project could cast MS in anything other than a positive light - at least we could do it! I tend to roll my eyes at people who cast stones as MS for the good *and* the bad. > that's why we call it wind'ohs. Because it provides software sufficient to hook spambayes as God intended? (Yeah, they also provide one that isn't, but they do appear to be one of the very few that does). Because my OS is absolutely rock-solid? Because I have a debugger functional for large programs (ask the Mozilla guys)? Because I can plug my friend's digital cam in and have it "just work" without installing a single driver? Maybe us "microshaft" apologists should start referring to "penguin shit" <1.0 wink>. > How about a sampling? <2/3 wink> - TimS > > > > >_______________________________________________ > >Spambayes mailing list > >Spambayes@python.org > >http://mail.python.org/mailman/listinfo/spambayes > > > > > > > c'est moi - TimS > http://www.fourstonesExpressions.com > http://wecanstopspam.org > > > > _______________________________________________ > Spambayes mailing list > Spambayes@python.org > http://mail.python.org/mailman/listinfo/spambayes While I am feeling like not-a-morning-person , Tim, would it be possible for you to trim your posts? Occasionally I see a one-line reply from you, but am never clear if you have any additional text - so I scroll lots of quoted text, just to find your SIG. Thanks, Mark. From mhammond at skippinet.com.au Fri Feb 21 09:17:48 2003 From: mhammond at skippinet.com.au (Mark Hammond) Date: Thu Feb 20 17:18:48 2003 Subject: [Spambayes] getting SpamBayes to work with Outlook XP In-Reply-To: <14677.1045762488@qwop.com> Message-ID: <000701c2d92d$e773d900$530f8490@eden> > I first tried ActiveState's version of python, thinking I'd be all set > because it comes with the win32all stuff (at least, I think it does). > It's also got the email extensions (again, at least I think it does) > because it's 2.2 (and the email extensions come with 2.2+, > according to > Barry's web page about the email extensions). I will try and make the readme clearer in this regard - ActivePython is known not to work, as is older versions of win32all. FWIW, you would have had the exact same issue on any version of Windows. I now actually have a single .EXE installer for a stand-alone SpamBayes that works :) All I need to do it get my starship mess sorted out, and I will have somewhere to put it. "starship mess" actually means moving to rsync to maintain the pages, and that sounds scary Mark. From tim at fourstonesExpressions.com Thu Feb 20 16:22:40 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Thu Feb 20 17:22:52 2003 Subject: Off Topic [Spambayes] Outlook Express configuration In-Reply-To: <000601c2d92c$f239d520$530f8490@eden> Message-ID: 2/20/2003 4:10:57 PM, "Mark Hammond" wrote: >> >AFAIK, OE has no programmatic interfaces or hooks, none, nada, zilch. >> >> I concur. Unbelievable. > >Not really. MS give away Outlook Express. Their commercial offering (they >are a company, after all) is Outlook. It is very basic business sense to >differentiate these products. > >> At least the other mailers store >> stuff somewhere >> where you can parse it out and figure out how to configure >> it... > >Obviously Outlook Express stores it *somewhere* (IIRC, .elm files in >"Application Data") and obviously a number of other products have worked out >how to parse the info. Writing new info will be harder, hence we see almost >no 3rd party products with this ability. > >Windows is Windows, and you wont find me apologizing for the areas in which >it sucks. On the other hand, you wont find me "apologizing" for the things >it does well. I am pretty confounded as to how this SpamBayes project could >cast MS in anything other than a positive light - at least we could do it! >I tend to roll my eyes at people who cast stones as MS for the good *and* >the bad. > >> that's why we call it wind'ohs. > >Because it provides software sufficient to hook spambayes as God intended? >(Yeah, they also provide one that isn't, but they do appear to be one of the >very few that does). Because my OS is absolutely rock-solid? Because I >have a debugger functional for large programs (ask the Mozilla guys)? >Because I can plug my friend's digital cam in and have it "just work" >without installing a single driver? Maybe us "microshaft" apologists should >start referring to "penguin shit" <1.0 wink>. Ok, so I accept the drubbing I have coming. If I can't take it, I shouldn't dish it out, right? Like Darth Vader, u$ isn't *completely* evil... I certainly don't see them with spambayes tunnel vision, though. I use their products because for the time being I don't have a choice. When I do, their products will become much better because they won't be able to *assume* that I'll continue using their products. I wait for that day, and work to hasten it. > >> How about a sampling? <2/3 wink> - TimS >> >> > >> >_______________________________________________ >> >Spambayes mailing list >> >Spambayes@python.org >> >http://mail.python.org/mailman/listinfo/spambayes >> > >> > >> >> >> c'est moi - TimS >> http://www.fourstonesExpressions.com >> http://wecanstopspam.org >> >> >> >> _______________________________________________ >> Spambayes mailing list >> Spambayes@python.org >> http://mail.python.org/mailman/listinfo/spambayes > >While I am feeling like not-a-morning-person , Tim, would it be >possible for you to trim your posts? Occasionally I see a one-line reply >from you, but am never clear if you have any additional text - so I scroll >lots of quoted text, just to find your SIG. > >Thanks, > >Mark. > > >_______________________________________________ >Spambayes mailing list >Spambayes@python.org >http://mail.python.org/mailman/listinfo/spambayes > > c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org From mhammond at skippinet.com.au Fri Feb 21 09:25:58 2003 From: mhammond at skippinet.com.au (Mark Hammond) Date: Thu Feb 20 17:26:32 2003 Subject: [Spambayes] training WAS: aging information In-Reply-To: <16E1010E4581B049ABC51D4975CEDB886199EE@UKDCX001.uk.int.atosorigin.com> Message-ID: <000801c2d92f$0b4385a0$530f8490@eden> [Paul] > From: Tim Peters [mailto:tim.one@comcast.net] > > I get a few (2 to 10) Unsures each day, but they're > > generally so unusual I don't bother to train on them. > > This made me think for a moment "how can I *not* train > on unsures in Outlook?" > > In retrospect, it's easy - just read them (or not) in > place and then delete or file (in a folder not marked > as "certain ham"). But the Outlook plugin has ingrained > me with the idea that you see spam anywhere other than > the spam folder, and you hit the "delete as spam" > button. Interesting. I am in Tim's mindset - train on messages I think SpamBayes should have caught, and just delete those wierd ones. > The subliminal message will probably hit non-expert users, > too, so maybe it would be useful in rewording the text on > the button to emphasise the "train" aspect rather than the > "get rid of" aspect. Otherwise, I can imagine the average > user training on all unsures and FNs ad infinitum. This > will both increase the impression that maintenance is a > chore, and increase the spam:ham imbalance in the training > database over time. Do you have a specific suggestion here? "Delete and Train as Spam" looks way too big, at least for a top-level toolbar item. Unfortunately, http://support.microsoft.com/?kbid=208527 documents that the "Status Bar" can not be changed via the object model :( We could create a new top-level "Spam Status" Window that shows the progress of SpamBayes (with this window obviously being able to be closed by the user). By default, these toolbar items could perform the operation and make the window visible. I wouldn't mind a little status window, not unlike the progress in the Pythonwin trace window - something like: Message 'Make Money Fast' was the 65th peice of spam detected this session. Message 'Hi friend' was the 156th peice of good mail detected this session. Message 'Unsure' was trained as spam, and deleted to the spam folder. OTOH, in some ways I see this as geek-gloss that will just confuse Tim's poor sister :) Mark. -------------- next part -------------- A non-text attachment was scrubbed... Name: winmail.dat Type: application/ms-tnef Size: 2828 bytes Desc: not available Url : http://mail.python.org/pipermail/spambayes/attachments/20030221/3bcf6f56/winmail-0001.bin From tim.one at comcast.net Thu Feb 20 17:27:13 2003 From: tim.one at comcast.net (Tim Peters) Date: Thu Feb 20 17:27:49 2003 Subject: [Spambayes] Outlook Express configuration In-Reply-To: <000601c2d92c$f239d520$530f8490@eden> Message-ID: [Mark Hammond, defending the indefensible] > ... > Because it provides software sufficient to hook spambayes as God intended?\ Your Outlook plugin is quite an accomplishment all the same -- "easy" isn't the first adjective that comes to mind . OTOH, my impression is that Outlook doesn't expose a full object model, and that it's actually the weakest program in the Office suite that way. Even so, you've proved it's remarkably pliable, in a persistent expert's hands. > ... > Maybe us "microshaft" apologists should start referring to "penguin shit" > <1.0 wink>. Oh, dear, let's just all agree that Linux is beneath contempt, and leave it at that . promoter-of-universal-harmony-ly y'rs - tim From mwh at python.net Thu Feb 20 23:28:49 2003 From: mwh at python.net (Michael Hudson) Date: Thu Feb 20 18:28:54 2003 Subject: [Spambayes] Re: Outlook Express configuration References: <000601c2d92c$f239d520$530f8490@eden> Message-ID: <2mbs16cz0u.fsf@starship.python.net> Ah good! OS wars "Mark Hammond" writes: >> that's why we call it wind'ohs. > > Because it provides software sufficient to hook spambayes as God intended? Because it's a sufficiently homogenous environment that we can support one mailer and think we're done? Most unix mailers provide enough functionality to hook into spambayes (heck, I'm typing this from Gnus) we just can't do them all at once. > (Yeah, they also provide one that isn't, but they do appear to be one of the > very few that does). Because my OS is absolutely rock-solid? Because I > have a debugger functional for large programs (ask the Mozilla guys)? > Because I can plug my friend's digital cam in and have it "just work" > without installing a single driver? Maybe us "microshaft" apologists should > start referring to "penguin shit" <1.0 wink>. Ah shucks, buy a Mac :-) Cheers, M. (who's spent a while this evening trying to play a dvd under linux) -- ARTHUR: Yes. It was on display in the bottom of a locked filing cabinet stuck in a disused lavatory with a sign on the door saying "Beware of the Leopard". -- The Hitch-Hikers Guide to the Galaxy, Episode 1 From jh at web.de Fri Feb 21 00:46:57 2003 From: jh at web.de (Juergen Hermann) Date: Thu Feb 20 18:47:29 2003 Subject: [Spambayes] Other pop3proxy options In-Reply-To: <200302201352.42579.tshumway@jdiworks.net> Message-ID: On Thu, 20 Feb 2003 13:52:42 -0800, Terrel Shumway wrote: >I am working on a complete mail server suite written in python. The pop3 >server is working with a maildir backing store. The pop3 client "works" for >now, using poplib.py, but I will soon have an async version that supports >pipelining etc. The smtp currently listener works for basic delivery, but >some fancy configuration stuff still needs to be worked out. It is all >designed to be runnable as a single process with async listeners and >http-based configuration -- click and run. Sounds a lot like you reinvent parts of Twisted. Ciao, J?rgen From tim at fourstonesExpressions.com Thu Feb 20 18:18:16 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Thu Feb 20 19:18:30 2003 Subject: [Spambayes] Other pop3proxy options In-Reply-To: <200302201352.42579.tshumway@jdiworks.net> Message-ID: 2/20/2003 3:52:42 PM, Terrel Shumway wrote: >On Wednesday 19 February 2003 16:08, Tim Stone - Four Stones Expressions >wrote: > >> We're getting very close to requiring a full blown mail client, not just a >> proxy. Perhaps the solution to many problems is to provide a spambayes >> mailer, with training, retraining, etc. etc. all built in, and provide the >> proxy with necessarily limited function for those who don't wish to use the >> spambayes mailer. This thought is daunting, but it really is the easiest >> from a useability standpoint. - TimS > >I have not read this thread very closely yet, but I thought I would chime in. > >I am working on a complete mail server suite written in python. The pop3 >server is working with a maildir backing store. The pop3 client "works" for >now, using poplib.py, but I will soon have an async version that supports >pipelining etc. The smtp currently listener works for basic delivery, but >some fancy configuration stuff still needs to be worked out. It is all >designed to be runnable as a single process with async listeners and >http-based configuration -- click and run. There are some possibilities here, and at least you might look at incorporating spambayes into it for value-add sake. You might take a look at our pop3proxy... - TimS > >Now I ask: could this help solve the problems you are discussing? It could be >used to make a smarter proxy that uses an intermediate store rather than an >online connection to an upstream server. > >(Now I will go back and catch up on the discussion) > >-- Terrel > > > > c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org From noreply at sourceforge.net Thu Feb 20 17:11:46 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Thu Feb 20 20:19:45 2003 Subject: [Spambayes] [ spambayes-Bugs-690418 ] Non mail items filtered by Outlook Message-ID: Bugs item #690418, was opened at 2003-02-21 12:11 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=690418&group_id=61702 Category: Outlook Group: None Status: Open Resolution: None Priority: 5 Submitted By: Mark Hammond (mhammond) Assigned to: Mark Hammond (mhammond) Summary: Non mail items filtered by Outlook Initial Comment: DeliveryReports are filtered by SpamBayes. We should check that we are only filtering mail items. I added a check for this in the "Recover from Spam" buttons, so we can copy that. Indeed, we *must* copy that, as the filter may move such a message, but then our "Delete as/Recover from" buttons won't let us get it back ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=690418&group_id=61702 From mhammond at skippinet.com.au Fri Feb 21 14:30:44 2003 From: mhammond at skippinet.com.au (Mark Hammond) Date: Thu Feb 20 22:31:45 2003 Subject: [Spambayes] stand-alone Outlook addin pre-pre-alpha Message-ID: <007801c2d959$9eabef60$530f8490@eden> I have put a stand-alone (via McMillan's Installer) setup program (via Inno setup) for the Outlook Plugin at http://starship.python.net/crew/mhammond/spambayes I won't rehash what I have already put there, but what I didn't say there: * It is far from perfect, but I am running out of time to play on this any more and wanted to get it out. The new "Outlook2000/installer" directory, the McMillan Installer version on my starship page, and the free Inno Setup are all that is needed to recreate this binary. * This is built on Python 2.3a2 (ish) and comes with bsddb. You existing pickles will *not* be located and used. Thus, if you use pickles now, you will need to retrain. If you currently use bsddb files for the database, these should continue to work with the binary. (Note I am happy to receive patches to convert a pickle to a bsddb via dbmExport, but as I no longer use pickles, I no longer care ) * You still see trace output messages as you do now - generally via Pythonwin. This will be particularly important if it doesn't work :) * If you use any other Python Outlook plugin, this will not work. However, I know of only 2 such addins - the sample one with win32all (which can be unregistered) and the one I use here personally to trap all the virus crap I get! So this should not be a problem. * I have to be honest - I haven't used this binary for long. It starts and seems to work OK, and all functions work. As I use a Python extension to strip the amazing number of virus email I get, I can't use the binary for too long before the other shit pours in again . Please let us know how you go. Please don't give it to your sister just yet though Thanks, Mark. From tim.one at comcast.net Thu Feb 20 23:30:33 2003 From: tim.one at comcast.net (Tim Peters) Date: Thu Feb 20 23:30:59 2003 Subject: [Spambayes] Are they learning? In-Reply-To: Message-ID: [Kaitlin Duck Sherwood] > If the spammers ever get too clever for a purely word-based approach, > then it would be easy to toss in the ratio of > non-letter characters (perl /W) : letter characters (perl /w) > and/or > characters inside HTML tags : characters outside HTML > and/or > number of spaces : total length of message > as features. > > I believe that those ratios will do a good job of spotting messages > that have wildly different "eye space" and "ASCII space" presentations. In unreported early experiments, I generated a token for the ratio of number of bytes to number of whitespace-separated "words" in a msg. A high ratio was a very strong spam indicator. I left the code out, though, because it made no difference to overall error rates in testing: whatever it was latching on to was already covered by other stuff at the time. Like many other gimmicks, it also over-penalized HTML msgs *just* for using HTML at all. I expect that your suggested statistics would also be strong indicators, but also possibly redundant. The msg Rob forwarded that kicked this off didn't impress me, just like other msgs in the past playing goofy typographical tricks didn't impress me: anything that makes an advertisement harder to read is going to reduce response rate, so I don't expect such tricks to endure. I've seen "stuff like that" all along, but it's always been a very small percentage of the spam I get. I expect that Rob noticed it only because he got it from a python.org mailing list, and spam from such lists is rare (so all the header clues saying "it came from python.org"-- and there are many --have very low spamprobs). From T.A.Meyer at massey.ac.nz Fri Feb 21 19:14:50 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Fri Feb 21 01:15:31 2003 Subject: [Spambayes] training Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1318D509@its-xchg4.massey.ac.nz> Ok, I've done some testing, and here are the results from these clients. They are all windows apps, as I don't have a linux or mac box handy to test with. If anyone wants some other test done while I have all of these installed, let me know :) [I did not do any Outlook testing, since the plugin does everything anyone might possibly want] All clients tested, unsurprisingly, will include the message body in the forwarded message. The following client/methods will forward all headers: Eudora 5.2 Forward Netscape Messenger (4.7) Forward (inline) Netscape Messenger (4.7) Forward (attachment) Plain Netscape Messenger (4.7) Forward (attachment) HTML Netscape Messenger (4.7) Forward (attachment) Plain & HTML M2 (Opera Mailer) Redirect The Bat! Forward (RFC Headers visible) The Bat! Alternative Forward The Bat! Custom Template Pegasus Mail Forward (all headers option set) Calypso 3 Redirect Becky! Redirect as attachment The following client/methods will *not* forward all headers: Eudora 5.2 Redirect Netscape Messenger (4.7) Forward (quoted) Plain Netscape Messenger (4.7) Forward (quoted) HTML Netscape Messenger (4.7) Forward (quoted) Plain & HTML Outlook Express 6 Forward HTML (Base64) Outlook Express 6 Forward HTML (None) Outlook Express 6 Forward HTML (QP) Outlook Express 6 Forward Plain (Base64) Outlook Express 6 Forward Plain (None) Outlook Express 6 Forward Plain (QP) Outlook Express 6 Forward Plain (uuencoded) http://www.endymion.com/products/mailman Forward M2 (Opera Mailer) Forward The Bat! Forward (RFC Headers not visible) The Bat! Redirect AllegroMail Forward AllegroMail Redirect PocoMail Forward PocoMail Bounce Pegasus Mail Forward (all headers option not set) Calypso 3 Forward Becky! Forward Becky! Redirect So the only mailers (tested) that do not have a forwarding option that will preserve headers are Outlook Express, AllegroMail, PocoMail and (Endymion's) Mailman. Not too bad. =Tony Meyer From T.A.Meyer at massey.ac.nz Fri Feb 21 19:21:57 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Fri Feb 21 01:22:33 2003 Subject: [Spambayes] SMTPProxy [Was Training] Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1318CD68@its-xchg4.massey.ac.nz> I've updated the smtpproxy that TimS whipped up a while ago. It will happily run alongside pop3proxy (including training). The idea is that when you get a message you want to train, you forward it to one of the addresses in Options.py and it will extract the message id and train it. You can set: * the name of the header for the id * if the id is in the headers * if the id is in the body * the address to monitor to catch messages to train as spam * the address to monitor to catch messages to train as ham * the address to monitor to shutdown (you don't have to use this one) * whether incoming mail strips any existing ids in the headers/body * the server/port to pass messages through to * the port to monitor It all seems to run fine for me. Anyone else like to test it? Let me know and I'll mail you the code. (Note that I'm not necessarily saying that a smtpproxy is the way to get end users to train, but I do think that it would be easier than the current (non-Outlook) system). =Tony Meyer From Paul.Moore at atosorigin.com Fri Feb 21 09:32:40 2003 From: Paul.Moore at atosorigin.com (Moore, Paul) Date: Fri Feb 21 04:33:14 2003 Subject: [Spambayes] training WAS: aging information Message-ID: <16E1010E4581B049ABC51D4975CEDB886199F0@UKDCX001.uk.int.atosorigin.com> From: Mark Hammond [mailto:mhammond@skippinet.com.au] >> The subliminal message will probably hit non-expert users, >> too, so maybe it would be useful in rewording the text on >> the button to emphasise the "train" aspect rather than the >> "get rid of" aspect. Otherwise, I can imagine the average >> user training on all unsures and FNs ad infinitum. This >> will both increase the impression that maintenance is a >> chore, and increase the spam:ham imbalance in the training >> database over time. > Do you have a specific suggestion here? "Delete and Train > as Spam" looks way too big, at least for a top-level toolbar > item. Sadly, no. The best I can think of is "Train as Spam". (Or maybe the non-techie version, "Mark as Spam") After all, the button *doesn't* delete the message - it moves it to the spam folder, yes, but it doesn't delete it. > We could create a new top-level "Spam Status" Window that shows > the progress of SpamBayes (with this window obviously being > able to be closed by the user). That would be nice, but it's certainly not essential... > By default, these toolbar items could perform the operation > and make the window visible. I don't like this. The buttons do their job nice and unobtrusively right now, lets not clutter things with extra windows popping up. Paul From tim at fourstonesExpressions.com Fri Feb 21 07:12:24 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Fri Feb 21 08:12:37 2003 Subject: [Spambayes] training In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F1318D509@its-xchg4.massey.ac.nz> Message-ID: >The following client/methods will forward all headers: >M2 (Opera Mailer) Redirect Which version of Opera? My version (6.05) does not preserve headers on a redirect. c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org From piersh at friskit.com Fri Feb 21 06:30:29 2003 From: piersh at friskit.com (Piers Haken) Date: Fri Feb 21 09:30:02 2003 Subject: [Spambayes] Outlook Express configuration Message-ID: <9891913C5BFE87429D71E37F08210CB92C74FE@zeus.sfhq.friskit.com> Try this: HKEY_CURRENT_USER\Software\Microsoft\Internet Account Manager\Accounts Piers. > -----Original Message----- > From: Tim Stone - Four Stones Expressions > [mailto:tim@fourstonesExpressions.com] > Sent: Tuesday, February 18, 2003 1:36 PM > To: Spambayes > Subject: [Spambayes] Outlook Express configuration > > > Does anyone know if there's an api, or some way to > programmatically configure > OE? I'm writing an installer for the pop3proxy, which needs > to simply set the > pop3 server address and port. I can't even find a place in > the registry where > this stuff is stored! Suggestions? > > c'est moi - TimS > http://www.fourstonesExpressions.com > http://wecanstopspam.org > > > > _______________________________________________ > Spambayes mailing list > Spambayes@python.org http://mail.python.org/mailman/listinfo/spambayes > From tim at fourstonesExpressions.com Fri Feb 21 10:01:11 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Fri Feb 21 11:04:35 2003 Subject: [Spambayes] Outlook Express configuration In-Reply-To: <9891913C5BFE87429D71E37F08210CB92C74FE@zeus.sfhq.friskit.com> Message-ID: 2/21/2003 8:30:29 AM, "Piers Haken" wrote: >Try this: > >HKEY_CURRENT_USER\Software\Microsoft\Internet Account Manager\Accounts There isn't an Accounts key in my registry here. None of the subkeys seem to have anything to do with OE... -TimS > >Piers. > c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org From tim.one at comcast.net Fri Feb 21 11:34:07 2003 From: tim.one at comcast.net (Tim Peters) Date: Fri Feb 21 11:34:40 2003 Subject: [Spambayes] Outlook Express configuration In-Reply-To: Message-ID: [Piers Haken] > Try this: > > HKEY_CURRENT_USER\Software\Microsoft\Internet Account Manager\Accounts [Tim Stone] > There isn't an Accounts key in my registry here. None of the subkeys > seem to have anything to do with OE... -TimS http://insideoe.tomsterdam.com/files/regkeys.htm explains that it may or may not be useful, depending on what the user has done since installing OE. Reverse-engineering OE isn't fun. From tim at fourstonesExpressions.com Fri Feb 21 10:35:45 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Fri Feb 21 11:35:52 2003 Subject: [Spambayes] Outlook Express configuration In-Reply-To: Message-ID: 2/21/2003 10:34:07 AM, Tim Peters wrote: >[Piers Haken] >> Try this: >> >> HKEY_CURRENT_USER\Software\Microsoft\Internet Account Manager\Accounts > >[Tim Stone] >> There isn't an Accounts key in my registry here. None of the subkeys >> seem to have anything to do with OE... -TimS > > > http://insideoe.tomsterdam.com/files/regkeys.htm > >explains that it may or may not be useful, depending on what the user has >done since installing OE. Reverse-engineering OE isn't fun. I think I'm gonna punt on this one. If you're an OE user, I'll tell you how to configure your server to the pop3proxy, but I can't do it for ya... - TimS > > > c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org From ducky at webfoot.com Fri Feb 21 12:11:41 2003 From: ducky at webfoot.com (Kaitlin Duck Sherwood) Date: Fri Feb 21 15:07:41 2003 Subject: OSAF collaboration (was Re: [Spambayes] training) In-Reply-To: References: Message-ID: At 9:07 PM +0100 2/19/03, Francois Granger wrote: > Another step would be to get close contacts with > http://www.osafoundation.org/ throught Kevin Altis eventually > because they will be happy to have this technology in their first > shipping release, and integration will be a dream for them. Francois: spambayes has had my attention for some time, and OSAF just hired me to work on both + the interaction design of email and + community-liaison stuff. (I formally start on Monday.) Thus I clearly will be a good person to talk to, and I personally think it would be marvellous to have a spambayes plug-in for Chandler. At 6:12 PM -0500 2/19/03, Tim Peters wrote: > I'm halfway toward concluding that no mail client has a usable plugin > architecture. The good news is that they will, just as soon as they're all rewritten in Python . Whether we'd make spambayes part of the Chandler distro or just provide good hooks might end up being a policy issue out of my hands, but I *do* want to make sure we provide a good plug-in API. We have a chance to do it right (and we're already using Python ). However, I'm going to be pretty busy getting up to speed for *at least* the first two weeks; OSAF isn't going to be ready to release any source code for a while. So hit the snooze button for right now and I'll ping y'all again when we are in a position to be able to process input usefully. No reply needed. P.S., if you were at the MIT conference, I'm the woman in the purple beret who asked lots of questions. -- Kaitlin Duck Sherwood Author of the _Overcome Email Overload_ series, http://www.EmailOverload.com Help free our mailboxes. Include http://wecanstopspam.org in your signature. From whisper at oz.net Fri Feb 21 13:32:26 2003 From: whisper at oz.net (David LeBlanc) Date: Fri Feb 21 16:32:49 2003 Subject: [Spambayes] Message-ID: David LeBlanc Seattle, WA USA From whisper at oz.net Fri Feb 21 13:36:33 2003 From: whisper at oz.net (David LeBlanc) Date: Fri Feb 21 16:37:14 2003 Subject: [Spambayes] Spammie anomoly Message-ID: Yesterday, I had need to reboot my box. When it came back up, OL did it's usual protracted song and dance with the hd before presenting it's GUI. When I got up this morning, I had 18 maybe-spams and 32 spams - all but one of them where actually HAM. Most of them where from python-list. This hasn't happened since not too long after I installed spammie and not at all since I cleaned up the spam training corpus and retrained. Any ideas? David LeBlanc Seattle, WA USA From noreply at sourceforge.net Fri Feb 21 13:44:54 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Fri Feb 21 16:40:57 2003 Subject: [Spambayes] [ spambayes-Feature Requests-690914 ] Un-classify an email message Message-ID: Feature Requests item #690914, was opened at 2003-02-21 21:44 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498106&aid=690914&group_id=61702 Category: None Group: None Status: Open Priority: 5 Submitted By: Carl Nygard (cnygard) Assigned to: Nobody/Anonymous (nobody) Summary: Un-classify an email message Initial Comment: I'm wondering if the math for adding a message classified as ham or spam is reversible. If it is, it would be a nice feature to reverse the equations and "subtract" the effect of a certain email. For example, if I mis-classify a message, it would be nice to un-classify it. I'm guessing that even if the math is close (due to hysteresis effect??) it might be a useful feature. But then, I haven't looked at the code, sorry. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498106&aid=690914&group_id=61702 From mhammond at skippinet.com.au Sat Feb 22 09:19:15 2003 From: mhammond at skippinet.com.au (Mark Hammond) Date: Fri Feb 21 17:20:47 2003 Subject: [Spambayes] FW: Problems installing Outlook pluggin Message-ID: <00b701c2d9f7$453a66c0$530f8490@eden> This is a wierd one. The bottom of this traceback has: File "D:\Windows\Prog\PYTHON~1\Lib\email\Utils.py", line 10, in ? import random File "D:\Windows\Prog\PYTHON~1\Lib\random.py", line 93, in ? _verify('NV_MAGICCONST', NV_MAGICCONST, 1.71552776992141) File "D:\Windows\Prog\PYTHON~1\Lib\random.py", line 88, in _verify raise ValueError( exceptions.ValueError: computed value for NV_MAGICCONST deviates too much (compu ted 2,82843, expected 1) Anyone have a clue what this means? Thanks, Mark. -------------- next part -------------- An embedded message was scrubbed... From: "Carlos Ardanza" Subject: Problems installing Outlook pluggin Date: Sat, 22 Feb 2003 02:28:37 +1100 Size: 6511 Url: http://mail.python.org/pipermail/spambayes/attachments/20030222/8a1d403e/attachment.eml From mhammond at skippinet.com.au Sat Feb 22 09:19:50 2003 From: mhammond at skippinet.com.au (Mark Hammond) Date: Fri Feb 21 17:21:21 2003 Subject: [Spambayes] training WAS: aging information In-Reply-To: <16E1010E4581B049ABC51D4975CEDB886199F0@UKDCX001.uk.int.atosorigin.com> Message-ID: <00bc01c2d9f7$5a4906c0$530f8490@eden> > > Do you have a specific suggestion here? "Delete and Train > > as Spam" looks way too big, at least for a top-level toolbar > > item. > > Sadly, no. The best I can think of is "Train as Spam". (Or maybe > the non-techie version, "Mark as Spam") After all, the button > *doesn't* delete the message - it moves it to the spam folder, > yes, but it doesn't delete it. Maybe just "Spam!" and "Not Spam!" > > We could create a new top-level "Spam Status" Window that shows > > the progress of SpamBayes (with this window obviously being > > able to be closed by the user). > > That would be nice, but it's certainly not essential... > > > By default, these toolbar items could perform the operation > > and make the window visible. > > I don't like this. The buttons do their job nice and unobtrusively > right now, lets not clutter things with extra windows popping up. agreed. Mark. -------------- next part -------------- A non-text attachment was scrubbed... Name: winmail.dat Type: application/ms-tnef Size: 2120 bytes Desc: not available Url : http://mail.python.org/pipermail/spambayes/attachments/20030222/1e0263c0/winmail.bin From noreply at sourceforge.net Fri Feb 21 14:00:53 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Fri Feb 21 17:22:44 2003 Subject: [Spambayes] [ spambayes-Feature Requests-690928 ] turn off saving messages in popproxy Message-ID: Feature Requests item #690928, was opened at 2003-02-21 22:00 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498106&aid=690928&group_id=61702 Category: None Group: None Status: Open Priority: 5 Submitted By: Carl Nygard (cnygard) Assigned to: Nobody/Anonymous (nobody) Summary: turn off saving messages in popproxy Initial Comment: It would be nice to be able to turn off saving message for training, and just let the settings chug. I'm guessing that the messages will just pile up if I don't go in and at least discard the messages every day. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498106&aid=690928&group_id=61702 From noreply at sourceforge.net Fri Feb 21 14:28:38 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Fri Feb 21 17:22:49 2003 Subject: [Spambayes] [ spambayes-Feature Requests-690914 ] Un-classify an email message Message-ID: Feature Requests item #690914, was opened at 2003-02-21 16:44 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498106&aid=690914&group_id=61702 Category: None Group: None Status: Open Priority: 5 Submitted By: Carl Nygard (cnygard) Assigned to: Nobody/Anonymous (nobody) Summary: Un-classify an email message Initial Comment: I'm wondering if the math for adding a message classified as ham or spam is reversible. If it is, it would be a nice feature to reverse the equations and "subtract" the effect of a certain email. For example, if I mis-classify a message, it would be nice to un-classify it. I'm guessing that even if the math is close (due to hysteresis effect??) it might be a useful feature. But then, I haven't looked at the code, sorry. ---------------------------------------------------------------------- >Comment By: Tim Peters (tim_one) Date: 2003-02-21 17:28 Message: Logged In: YES user_id=31435 The math is exactly reversible, and the underlying classifier has both learn() and unlearn() methods. Whether you can get at them easily depends on the client you're using; for example, it's very easy from the project's Outlook client. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498106&aid=690914&group_id=61702 From tim at fourstonesExpressions.com Fri Feb 21 16:27:11 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Fri Feb 21 17:27:51 2003 Subject: [Spambayes] FW: Problems installing Outlook pluggin In-Reply-To: <00b701c2d9f7$453a66c0$530f8490@eden> Message-ID: <3WEBZVNILGMHB7M1MILKC8USOI52WQ.3e56a7bf@myst> Mark, I'm wondering if random.py hasn't been replaced in your installation or something. The _verify function checks to see if a calculated result is within tolerance of an expected result, to verify that the math routines are doing things sufficiently well... Here's the code in my random.py starting line 86: def _verify(name, computed, expected): if abs(computed - expected) > 1e-7: raise ValueError( "computed value for %s deviates too much " "(computed %g, expected %g)" % (name, computed, expected)) NV_MAGICCONST = 4 * _exp(-0.5)/_sqrt(2.0) _verify('NV_MAGICCONST', NV_MAGICCONST, 1.71552776992141) Is that what it looks like in yours? PS. Notice the trim... ;) c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org From tim.one at comcast.net Fri Feb 21 17:30:04 2003 From: tim.one at comcast.net (Tim Peters) Date: Fri Feb 21 17:31:07 2003 Subject: [Spambayes] FW: Problems installing Outlook pluggin In-Reply-To: <00b701c2d9f7$453a66c0$530f8490@eden> Message-ID: [Mark Hammond] > This is a wierd one. The bottom of this traceback has: > > File "D:\Windows\Prog\PYTHON~1\Lib\email\Utils.py", line 10, in ? > import random > File "D:\Windows\Prog\PYTHON~1\Lib\random.py", line 93, in ? > _verify('NV_MAGICCONST', NV_MAGICCONST, 1.71552776992141) > File "D:\Windows\Prog\PYTHON~1\Lib\random.py", line 88, in _verify > raise ValueError( > exceptions.ValueError: computed value for NV_MAGICCONST deviates too much > (computed 2,82843, expected 1) > > > Anyone have a clue what this means? Yup, but we usually see it only in systems with GNU readline, or some bizarre (non-Pythonwin ) GUI: the C locale has gotten screwed up. That's why the last line printed 2.82843 with a comma instead of a decimal point. For the same reason, the 1.71552776992141 in the verify() call isn't being parsed as intended as a floating constant either. I've no idea *how* your locale got hosed, though. From noreply at sourceforge.net Fri Feb 21 14:39:38 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Fri Feb 21 17:36:41 2003 Subject: [Spambayes] [ spambayes-Feature Requests-690914 ] Un-classify an email message Message-ID: Feature Requests item #690914, was opened at 2003-02-21 15:44 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498106&aid=690914&group_id=61702 Category: None Group: None Status: Open Priority: 5 Submitted By: Carl Nygard (cnygard) Assigned to: Nobody/Anonymous (nobody) Summary: Un-classify an email message Initial Comment: I'm wondering if the math for adding a message classified as ham or spam is reversible. If it is, it would be a nice feature to reverse the equations and "subtract" the effect of a certain email. For example, if I mis-classify a message, it would be nice to un-classify it. I'm guessing that even if the math is close (due to hysteresis effect??) it might be a useful feature. But then, I haven't looked at the code, sorry. ---------------------------------------------------------------------- >Comment By: Tim Stone (timstone4) Date: 2003-02-21 16:39 Message: Logged In: YES user_id=645698 The pop3proxy does not currently support this behavior, though it could in theory. It throws away trained messages at the moment. This may change in the not-so-distant future. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2003-02-21 16:28 Message: Logged In: YES user_id=31435 The math is exactly reversible, and the underlying classifier has both learn() and unlearn() methods. Whether you can get at them easily depends on the client you're using; for example, it's very easy from the project's Outlook client. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498106&aid=690914&group_id=61702 From francois.granger at free.fr Fri Feb 21 23:46:05 2003 From: francois.granger at free.fr (Francois Granger) Date: Fri Feb 21 17:46:42 2003 Subject: OSAF collaboration (was Re: [Spambayes] training) In-Reply-To: References: Message-ID: At 12:11 -0800 21/02/2003, in message OSAF collaboration (was Re: [Spambayes] training), Kaitlin Duck Sherwood wrote: >At 9:07 PM +0100 2/19/03, Francois Granger wrote: >> Another step would be to get close contacts with >>http://www.osafoundation.org/ > >Francois: spambayes has had my attention for some time, and OSAF >just hired me to work on both >+ the interaction design of email >and >+ community-liaison stuff. >(I formally start on Monday.) Good news ! Spambayes and Chandler and PythonCard and some other project have my attention, that the reason why I thought of linking between them. This seems to be the time where Python based technology is spreading fast. >P.S., if you were at the MIT conference Unfortunately not me, MIT is really far from France ;-) -- Hofstadter's Law : It always takes longer than you expect, even when you take into account Hofstadter's Law. From T.A.Meyer at massey.ac.nz Sat Feb 22 12:14:24 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Fri Feb 21 18:15:34 2003 Subject: [Spambayes] training Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1318D50B@its-xchg4.massey.ac.nz> > >The following client/methods will forward all headers: > >M2 (Opera Mailer) Redirect > > Which version of Opera? My version (6.05) does not preserve > headers on a redirect. I should have been more specific :) 7.01. I should clarifiy what I said, too. When I said forward all headers, this *might* mean that the headers are forwarded in the message body, not necessarily forwarded in the headers of the redirected/forwarded mail. =Tony Meyer From tim at fourstonesExpressions.com Fri Feb 21 17:20:49 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Fri Feb 21 18:21:27 2003 Subject: [Spambayes] training In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F1318D50B@its-xchg4.massey.ac.nz> Message-ID: 2/21/2003 5:14:24 PM, "Meyer, Tony" wrote: >> >The following client/methods will forward all headers: >> >M2 (Opera Mailer) Redirect >> >> Which version of Opera? My version (6.05) does not preserve >> headers on a redirect. > >I should have been more specific :) 7.01. I should clarifiy what I said, too. When I said forward all headers, this *might* mean that the headers are forwarded in the message body, not necessarily forwarded in the headers of the redirected/forwarded mail. Yeah, ok... As long as the header we need is included *somewhere* we're ok. I haven't upgraded yet. Anybody know what the 'spam filter' is in the mailer? It doesn't appear to be bayesian... > >=Tony Meyer > >_______________________________________________ >Spambayes mailing list >Spambayes@python.org >http://mail.python.org/mailman/listinfo/spambayes > > c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org From noreply at sourceforge.net Fri Feb 21 17:58:45 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Fri Feb 21 22:02:17 2003 Subject: [Spambayes] [ spambayes-Feature Requests-690997 ] automated unlearn/relearn in mboxtrain.py Message-ID: Feature Requests item #690997, was opened at 2003-02-21 17:58 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498106&aid=690997&group_id=61702 Category: None Group: None Status: Open Priority: 5 Submitted By: Caleb Shay (chinstrap) Assigned to: Nobody/Anonymous (nobody) Summary: automated unlearn/relearn in mboxtrain.py Initial Comment: I run mboxtrain.py via a cron job every night. However, if I had a FP or FN sitting in either my ham or spam box when the training happens it will of course reinforce this bad behaviour. Moving the message to the correct mailbox won't fix the problem, since mboxtrain adds a header to each message it has trained on saying that it already trained on this message and not to do it again. What would be nice is if that header actually said that it had already been trained on as spam/ham. If mboxtrainer comes across this header in a message for the opposite (ie, this message was trained as ham, but I'm finding it in the spam box now) it will untrain that message and then train it again properly. Of course, it's possible that it already does this and it's just not documented. I don't know python, so I can't really figure it out myself. Additionally, running 'python setup.py install' doesn't automatically install mboxtrain.py, is this correct? ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498106&aid=690997&group_id=61702 From noreply at sourceforge.net Fri Feb 21 19:16:20 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Fri Feb 21 22:20:37 2003 Subject: [Spambayes] [ spambayes-Feature Requests-690997 ] automated unlearn/relearn in mboxtrain.py Message-ID: Feature Requests item #690997, was opened at 2003-02-21 19:58 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498106&aid=690997&group_id=61702 Category: None Group: None Status: Open Priority: 5 Submitted By: Caleb Shay (chinstrap) Assigned to: Nobody/Anonymous (nobody) Summary: automated unlearn/relearn in mboxtrain.py Initial Comment: I run mboxtrain.py via a cron job every night. However, if I had a FP or FN sitting in either my ham or spam box when the training happens it will of course reinforce this bad behaviour. Moving the message to the correct mailbox won't fix the problem, since mboxtrain adds a header to each message it has trained on saying that it already trained on this message and not to do it again. What would be nice is if that header actually said that it had already been trained on as spam/ham. If mboxtrainer comes across this header in a message for the opposite (ie, this message was trained as ham, but I'm finding it in the spam box now) it will untrain that message and then train it again properly. Of course, it's possible that it already does this and it's just not documented. I don't know python, so I can't really figure it out myself. Additionally, running 'python setup.py install' doesn't automatically install mboxtrain.py, is this correct? ---------------------------------------------------------------------- >Comment By: Tim Stone (timstone4) Date: 2003-02-21 21:16 Message: Logged In: YES user_id=645698 mboxtrain behaves as you've requested. It is currect that setup.py does not currently install mboxtrain.py. I'm not sure why that is, seems like an oversight to me. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498106&aid=690997&group_id=61702 From noreply at sourceforge.net Fri Feb 21 19:28:20 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Fri Feb 21 22:20:43 2003 Subject: [Spambayes] [ spambayes-Feature Requests-690928 ] turn off saving messages in popproxy Message-ID: Feature Requests item #690928, was opened at 2003-02-21 16:00 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498106&aid=690928&group_id=61702 Category: None Group: None Status: Open Priority: 5 Submitted By: Carl Nygard (cnygard) Assigned to: Nobody/Anonymous (nobody) Summary: turn off saving messages in popproxy Initial Comment: It would be nice to be able to turn off saving message for training, and just let the settings chug. I'm guessing that the messages will just pile up if I don't go in and at least discard the messages every day. ---------------------------------------------------------------------- >Comment By: Tim Stone (timstone4) Date: 2003-02-21 21:28 Message: Logged In: YES user_id=645698 Messages are auto-deleted after 7 days, by default. This is not well documented, however. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498106&aid=690928&group_id=61702 From noreply at sourceforge.net Fri Feb 21 19:28:44 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Fri Feb 21 22:28:17 2003 Subject: [Spambayes] [ spambayes-Feature Requests-690928 ] turn off saving messages in popproxy Message-ID: Feature Requests item #690928, was opened at 2003-02-21 16:00 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498106&aid=690928&group_id=61702 Category: None Group: None Status: Open Priority: 5 Submitted By: Carl Nygard (cnygard) >Assigned to: Tim Stone (timstone4) Summary: turn off saving messages in popproxy Initial Comment: It would be nice to be able to turn off saving message for training, and just let the settings chug. I'm guessing that the messages will just pile up if I don't go in and at least discard the messages every day. ---------------------------------------------------------------------- Comment By: Tim Stone (timstone4) Date: 2003-02-21 21:28 Message: Logged In: YES user_id=645698 Messages are auto-deleted after 7 days, by default. This is not well documented, however. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498106&aid=690928&group_id=61702 From mhammond at skippinet.com.au Mon Feb 24 12:06:30 2003 From: mhammond at skippinet.com.au (Mark Hammond) Date: Sun Feb 23 20:07:32 2003 Subject: [Spambayes] stand-alone Outlook addin pre-pre-alpha In-Reply-To: <007801c2d959$9eabef60$530f8490@eden> Message-ID: <000b01c2dba0$f75ca750$530f8490@eden> > I have put a stand-alone (via McMillan's Installer) setup > program (via Inno > setup) for the Outlook Plugin at > http://starship.python.net/crew/mhammond/spambayes But it didn't work :( I have put a new version up that does. Mark. From dave at boost-consulting.com Mon Feb 24 07:51:58 2003 From: dave at boost-consulting.com (David Abrahams) Date: Mon Feb 24 08:30:32 2003 Subject: [Spambayes] Setting up server-side IMAP filtering Message-ID: Hi, I'm interested in setting up some server-side filtering on my IMAP server using SpamBayes. Unfortunately, I'm a bit naive about this and I'm hoping someone here can help me get started. The server is running CommuniGate Pro; I have Python 2.2.2 and procmail installed. Thanks in advance, Dave -- Dave Abrahams Boost Consulting www.boost-consulting.com From un at ix.de Mon Feb 24 09:46:39 2003 From: un at ix.de (Bert Ungerer) Date: Mon Feb 24 09:49:56 2003 Subject: [Spambayes] Filtering unusual words Message-ID: <3E59DBEF.2030204@ix.de> Dear Spambayes developers: I read the interesting articles in the Linux Journal. If I understood it correctly filtering and training of unusual words is critical. Most of spam that I receive contains several unique artificial words. How do you plan to deal with that kind of spam? Kind regards Bert -- Bert Ungerer fon +49.511.5352.368 Redaktion iX fax +49.511.5352.361 Helstorfer Str. 7 D-30625 Hannover Schlankere E-Mails kommen besser an: http://www.heise.de/ix/artikel/2001/05/003/ From tim at fourstonesExpressions.com Mon Feb 24 10:00:15 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Mon Feb 24 11:00:20 2003 Subject: [Spambayes] Filtering unusual words In-Reply-To: <3E59DBEF.2030204@ix.de> Message-ID: <3VSN12W2Z06HGKHV1YF0LHUP9A5NL.3e5a418f@myst> 2/24/2003 2:46:39 AM, Bert Ungerer wrote: >Dear Spambayes developers: > >I read the interesting articles in the Linux Journal. If I understood it >correctly filtering and training of unusual words is critical. > >Most of spam that I receive contains several unique artificial words. >How do you plan to deal with that kind of spam? The answer here is: It depends... Do the spams you receive contain ONLY completely unique artificial words? Or are there a few artificial words that are scattered in amongst regular 'spammy' text? If they contain ONLY words that are unique to a single instance of spam, and are artificial, then I doubt that the spam is anything other than meaningless gibberish. In that case, bayesian filtering is of limited value, since it will only see gibberish words one and only one time. However, if there are a few gibberish words scattered in amongst regular spam "buy this..." text, then the remainder of the text will be useful in determining the spamminess of the message. If the classifier sees enough words that you've trained it to look for (by your prior assertions as to what is and is not spam) then it will classify the mail as spam, regardless of how much other gibberish is in there. To date, we've not found much that can fool this algorithm with any degree of certainty and consistency. That's not to say that it's not possible, and we believe that spammers will begin desperately to try to break this technology, but it hasn't happened yet. And when it does, we'll adjust. Thanks for your query. - TimS > >Kind regards > c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org From skip at pobox.com Mon Feb 24 14:17:25 2003 From: skip at pobox.com (Skip Montanaro) Date: Mon Feb 24 15:18:04 2003 Subject: [Spambayes] Has anyone benchmarked performance? Message-ID: <15962.32213.187634.284763@montanaro.dyndns.org> I'm curious to know if anyone had done any comparative performance tests. Here at Northwestern, the folks who administer the mail servers have done some experimenting with SpamAssassin (versions and configuration unknown to me) and have, at times, been disappointed in its performance. I know for my own little installation in the Mojam environment I found Spambayes to be substantially faster at classifying messages than SA, though this was just a seat-of-the-pants observation. It's been awhile since I used SA. About the time I switched someone there was playing with generating fast scanners using lex, so SA performance may have improved dramatically since I last looked at it. I'll take a peek at classification performance of Spambayes and post some results in a bit. Thx, Skip From tim at fourstonesExpressions.com Mon Feb 24 14:57:43 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Mon Feb 24 15:58:22 2003 Subject: [Spambayes] SMTPProxy [Was Training] In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F1318CD6A@its-xchg4.massey.ac.nz> Message-ID: 2/23/2003 10:13:20 PM, "Meyer, Tony" wrote: >> I *think* that the pop3proxy keeps the database open, and so >> any training the smtp proxy might do will be clobbered (at least potentially) >> by the pop3proxy when it trains... am I wrong about that? >As it have it, as long as they don't try and train at the same time, then all is ok. If I didn't have them running in one process as two threads, yes there were these terrible problems of syncronisation. If someone has big issues with the separate threads method, then I'll have to rethink this one. > Ok, I think there will be issues with this. For example, I'll have to run four (at least) smtpproxy processes, all sharing the same database with pop3proxy. The pop3proxy uses asyncore to get around the problem of having to run multiple processes or threads with the requisite synchronisation problems. The chances are very good that this will kill somebody sooner or later. The smtpproxy as it stands now is a good stopgap, but this is the point where we decided further investment wasn't worth it. Now it appears as if it is, and so we should probably incorporate it into the pop3proxy code, using asyncore in the same way. Either that, or rearchitect things from the ground up, so the spambayes core (classification, tokenization, training, message management) are a real server process that's there to serve any spambayes running on the system. This is a bit of a big step, and I'm not sure it's necessary, but it IS an alternative, and I believe in exploring all alternatives... > >Cheers, >Tony > > c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org From msergeant at startechgroup.co.uk Mon Feb 24 21:41:29 2003 From: msergeant at startechgroup.co.uk (Matt Sergeant) Date: Mon Feb 24 16:31:41 2003 Subject: [Spambayes] Has anyone benchmarked performance? In-Reply-To: <15962.32213.187634.284763@montanaro.dyndns.org> Message-ID: On Mon, 24 Feb 2003, Skip Montanaro wrote: > > I'm curious to know if anyone had done any comparative performance tests. > Here at Northwestern, the folks who administer the mail servers have done > some experimenting with SpamAssassin (versions and configuration unknown to > me) and have, at times, been disappointed in its performance. I know for my > own little installation in the Mojam environment I found Spambayes to be > substantially faster at classifying messages than SA, though this was just a > seat-of-the-pants observation. > > It's been awhile since I used SA. About the time I switched someone there > was playing with generating fast scanners using lex, so SA performance may > have improved dramatically since I last looked at it. No, spamassassin is still slow. It probably always will be. I looked into various ways to speed it up, but came up short. If it's speed you're after, bayes is definitely faster (unless you're using CRM114 ;-) From robibaro at robibaro.com Mon Feb 24 16:37:50 2003 From: robibaro at robibaro.com (Eric Robibaro) Date: Mon Feb 24 16:38:29 2003 Subject: [Spambayes] Has anyone benchmarked performance? In-Reply-To: References: Message-ID: <156954078.1046104670@[10.0.18.7]> Errr what about mailscanner, doesn't the batching effect speed it up some? --On February 24, 2003 21:41 +0000 Matt Sergeant wrote: > On Mon, 24 Feb 2003, Skip Montanaro wrote: > >> >> I'm curious to know if anyone had done any comparative performance tests. >> Here at Northwestern, the folks who administer the mail servers have done >> some experimenting with SpamAssassin (versions and configuration unknown >> to me) and have, at times, been disappointed in its performance. I know >> for my own little installation in the Mojam environment I found >> Spambayes to be substantially faster at classifying messages than SA, >> though this was just a seat-of-the-pants observation. >> >> It's been awhile since I used SA. About the time I switched someone >> there was playing with generating fast scanners using lex, so SA >> performance may have improved dramatically since I last looked at it. > > No, spamassassin is still slow. It probably always will be. I looked into > various ways to speed it up, but came up short. > > If it's speed you're after, bayes is definitely faster (unless you're > using CRM114 ;-) > > > _______________________________________________ > Spambayes mailing list > Spambayes@python.org > http://mail.python.org/mailman/listinfo/spambayes From mhammond at skippinet.com.au Tue Feb 25 08:49:27 2003 From: mhammond at skippinet.com.au (Mark Hammond) Date: Mon Feb 24 16:51:02 2003 Subject: [Spambayes] SMTPProxy [Was Training] In-Reply-To: Message-ID: <003e01c2dc4e$9b551740$530f8490@eden> I've been thinking about this a little, as I see no good reason why Outlook and pop3proxy could not share the same database on Windows. > Ok, I think there will be issues with this. For example, > I'll have to run > four (at least) smtpproxy processes, all sharing the same > database with > pop3proxy. The pop3proxy uses asyncore to get around the > problem of having to > run multiple processes or threads with the requisite > synchronisation problems. > The chances are very good that this will kill somebody sooner > or later. I've 3 queries here: * Why will asyncore eventually kill someone? asyncore is complex to use, but I see no reason to believe it unreliable or unable to scale. * Why are threads, as opposed to asyncore, not suitable for a personal pop or smtp server? I would have thought that the maximum number of connections that need to be supported would be only "a few", and therefore OK to implement using threads. * Is there some reason you believe the new bsddb can not be reliably used by multiple processes? The only time I can see a problem is when one of the processes is doing a full retrain. Even if one process was "untraining" while another was training, I don't see a real problem. > in the same way. Either that, or rearchitect things from the > ground up, so > the spambayes core (classification, tokenization, training, message > management) are a real server process that's there to serve > any spambayes running on the system. I'm still confused as to why multiple processes hitting the same db is a problem. Mark. From tim at fourstonesExpressions.com Mon Feb 24 15:54:13 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Mon Feb 24 16:54:50 2003 Subject: [Spambayes] SMTPProxy [Was Training] In-Reply-To: <003e01c2dc4e$9b551740$530f8490@eden> Message-ID: <2YIGCB7ZU43D92VYT54PMLJ74DADB2Y.3e5a9485@myst> 2/24/2003 3:49:27 PM, "Mark Hammond" wrote: >I've 3 queries here: >* Why will asyncore eventually kill someone? asyncore is complex to use, >but I see no reason to believe it unreliable or unable to scale. No... that's not what I was saying. The current smtpproxy doesn't use asyncore. I'd rather not have to run a separate process for each smtp server I'm proxying. I'd rather not have to run a separate process for smtp proxy and pop3 proxy. > >* Why are threads, as opposed to asyncore, not suitable for a personal pop >or smtp server? I would have thought that the maximum number of connections >that need to be supported would be only "a few", and therefore OK to >implement using threads. I'll have to leave this one to Richie... I wondered the same thing, but he assured me that there are valid reasons to use asyncore over threads... > > >I'm still confused as to why multiple processes hitting the same db is a >problem. I could be totally confused about this. I just get a bit iffy when files are being shared/updated by sevaral processes, without locking, transaction control, etc. etc. > >Mark. > > > c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org From skip at pobox.com Mon Feb 24 16:36:03 2003 From: skip at pobox.com (Skip Montanaro) Date: Mon Feb 24 17:36:42 2003 Subject: [Spambayes] some preliminary timings Message-ID: <15962.40531.704297.626336@montanaro.dyndns.org> I just ran some simple(-minded) performance tests on my machine (Apple TiPowerbook, 800MHz, Mac OS 10.2.4). Using a simple filter loop from the interpreter: >>> from spambayes import mboxutils >>> fp = open("/Users/skip/tmp/newham.clean.save") >>> mbox = mailbox.PortableUnixMailbox (fp, mboxutils.get_message ) >>> i = 0 >>> t = time.clock() >>> for msg in mbox: ... x = h.filter(msg) ... i += 1 ... t = time.clock()-t >>> t = time.clock()-t >>> i 13389 >>> t 291.63 >>> i/t 45.910914514967594 You can see I was able to filter roughly 46 messages per cpu second, or 0.02 cpu seconds per message. I then wrote the first ten messages of the above mailbox to files: >>> fp = open("/Users/skip/tmp/newham.clean.save") >>> msgs = [] >>> mbox = mailbox.PortableUnixMailbox (fp, mboxutils.get_message ) >>> msgs.append(mbox.next()) >>> msgs.append(mbox.next()) >>> msgs.append(mbox.next()) >>> msgs.append(mbox.next()) >>> msgs.append(mbox.next()) >>> msgs.append(mbox.next()) >>> msgs.append(mbox.next()) >>> msgs.append(mbox.next()) >>> msgs.append(mbox.next()) >>> msgs.append(mbox.next()) >>> i = 1 >>> for m in msgs: ... f = open("msg%02d" % i, "w") ... f.write(m.as_string()) ... i += 1 ... ran a simple hammiefilter loop from the shell: % for f in msg?? ; do > time hammiefilter.py -d ~/hammie.db < $f > /dev/null > done and got real 0m0.464s user 0m0.240s sys 0m0.170s real 0m0.441s user 0m0.260s sys 0m0.130s real 0m0.442s user 0m0.240s sys 0m0.150s real 0m0.443s user 0m0.300s sys 0m0.100s real 0m0.580s user 0m0.370s sys 0m0.150s real 0m0.535s user 0m0.370s sys 0m0.090s real 0m0.501s user 0m0.280s sys 0m0.140s real 0m0.504s user 0m0.340s sys 0m0.110s real 0m0.638s user 0m0.410s sys 0m0.180s real 0m0.450s user 0m0.290s sys 0m0.100s for an average wallclock time of 0.5 seconds per message. Considering just user+sys times (to more accurately compare what time.clock() returns) brings the per-message time down to 0.44 seconds. Such a huge difference between hammiefilter and a raw filter loop suggests I may have done something wrong. Still, perhaps opening the db file for each message and all the imports hammiefilter has to do simply kills the performance. In an attempt to minimze the effect of byte compiling hammiefilter.py each time, I imported it once from the interpreter, then changed the loop to % for f in msg?? ; do > time python hammiefilter.pyc -d ~/hammie.db < $f > /dev/null > done This generated real 0m0.474s user 0m0.230s sys 0m0.160s real 0m0.468s user 0m0.240s sys 0m0.150s real 0m0.465s user 0m0.250s sys 0m0.100s real 0m0.460s user 0m0.270s sys 0m0.110s real 0m0.602s user 0m0.370s sys 0m0.160s real 0m0.556s user 0m0.340s sys 0m0.120s real 0m0.518s user 0m0.320s sys 0m0.110s real 0m0.522s user 0m0.310s sys 0m0.110s real 0m0.652s user 0m0.410s sys 0m0.150s real 0m0.473s user 0m0.300s sys 0m0.100s for an average wallclock time of 0.52 seconds and average user+sys time of 0.43 seconds, that is, not really much of an improvement at all. This was with a 2.3a1+ version of Python. Skip From nas at python.ca Mon Feb 24 16:01:44 2003 From: nas at python.ca (Neil Schemenauer) Date: Mon Feb 24 18:53:06 2003 Subject: [Spambayes] some preliminary timings In-Reply-To: <15962.40531.704297.626336@montanaro.dyndns.org> References: <15962.40531.704297.626336@montanaro.dyndns.org> Message-ID: <20030225000144.GA13259@glacier.arctrix.com> Skip Montanaro wrote: > Such a huge difference between hammiefilter and a raw filter loop suggests I > may have done something wrong. Still, perhaps opening the db file for each > message and all the imports hammiefilter has to do simply kills the > performance. Yes. Try "strace python hammiefilter.py". I count 848 open() system calls. 702 of them return ENOENT. A (relatively) small sample: stat64("spambayes/shelve", 0xbfffcd9c) = -1 ENOENT (No such file or directory) open("spambayes/shelve.so", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory) open("spambayes/shelvemodule.so", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory) open("spambayes/shelve.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory) open("spambayes/shelve.pyc", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory) stat64("shelve", 0xbfffcd9c) = -1 ENOENT (No such file or directory) open("shelve.so", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory) open("shelvemodule.so", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory) open("shelve.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory) open("shelve.pyc", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory) stat64("/usr/local/lib/python23.zip/shelve", 0xbfffcd9c) = -1 ENOENT (No such file or directory) open("/usr/local/lib/python23.zip/shelve.so", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory) open("/usr/local/lib/python23.zip/shelvemodule.so", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory) open("/usr/local/lib/python23.zip/shelve.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory) open("/usr/local/lib/python23.zip/shelve.pyc", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory) stat64("/usr/local/lib/python2.3/shelve", 0xbfffcd9c) = -1 ENOENT (No such file or directory) open("/usr/local/lib/python2.3/shelve.so", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory) open("/usr/local/lib/python2.3/shelvemodule.so", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory) open("/usr/local/lib/python2.3/shelve.py", O_RDONLY|O_LARGEFILE) = 5 fstat64(5, {st_mode=S_IFREG|0644, st_size=4739, ...}) = 0 open("/usr/local/lib/python2.3/shelve.pyc", O_RDONLY|O_LARGEFILE) = 6 fstat64(6, {st_mode=S_IFREG|0664, st_size=8662, ...}) = 0 old_mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x4001e000 read(6, ";\362\r\nN\231\35>c\0\0\0\0\0\0\0\0\5\0\0\0\0\0\0\0s\355"..., 4096) = 4096 Ouch. Having a longer sys.path makes things worse. Neil From T.A.Meyer at massey.ac.nz Tue Feb 25 13:06:56 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Mon Feb 24 19:09:04 2003 Subject: [Spambayes] SMTPProxy [Was Training] Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1318CD6D@its-xchg4.massey.ac.nz> > The current smtpproxy doesn't use > asyncore. I'd rather not have to run a separate process for > each smtp server I'm proxying. I would think that if smtpproxy does get used, then the multiple accounts capability would just be copied from pop3proxy. So it would use whatever pop3proxy does. > I'd rather not have to run a separate process > for smtp proxy and pop3 proxy. Well, as it is, it's not a separate process, it's a separate thread. [asyncore vs threads] > I'll have to leave this one to Richie... I wondered the same > thing, but he > assured me that there are valid reasons to use asyncore over > threads... smtpproxy is only using threads at the moment because it was a 30 second solution to using the same database without looking at the asyncore stuff, or modifying pop3proxy too much. I have no particular attachment to the threads :) [Mark] > >I'm still confused as to why multiple processes hitting the > >same db is a problem. [TimS] > I could be totally confused about this. I just get a bit > iffy when files are > being shared/updated by sevaral processes, without locking, > transaction control, etc. etc. I can't see any problems with the sharing apart from if two 'users' of the db tried to change the same message at the same time, but that could easily be fixed (in the thread version at least, my asyncore knowledge is very limited) with a couple of signals. =Tony Meyer From T.A.Meyer at massey.ac.nz Tue Feb 25 13:09:14 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Mon Feb 24 19:10:03 2003 Subject: [Spambayes] SMTPProxy [Was Training] Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1318D522@its-xchg4.massey.ac.nz> > I've been thinking about this a little, as I see no good > reason why Outlook > and pop3proxy could not share the same database on Windows. I think this would be a very good thing. Easy migration to & from the plugin, easy sharing your database between (for example) work with Outlook and home with pop3proxy. Have you started working on this, or is it all ideas at the moment? =Tony Meyer From tim at fourstonesExpressions.com Mon Feb 24 18:17:13 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Mon Feb 24 19:17:49 2003 Subject: [Spambayes] SMTPProxy [Was Training] In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F1318D522@its-xchg4.massey.ac.nz> Message-ID: 2/24/2003 6:09:14 PM, "Meyer, Tony" wrote: >> I've been thinking about this a little, as I see no good >> reason why Outlook >> and pop3proxy could not share the same database on Windows. > >I think this would be a very good thing. Easy migration to & from the plugin, easy sharing your database between (for example) work with Outlook and home with pop3proxy. > >Have you started working on this, or is it all ideas at the moment? This is all a loosely defined set of ideas right now, documented mostly as a bunch of objections to the current design, all of which are completely valid... I sent a synopsis mail to neale pickett a couple weeks ago, you can look back and find it, if you can't I still have it somewhere and I'll dig it up. It'll be a few hours before I can get to it... Mark posted some great ideas along these lines about a week ago, and I'll dig that post up as well... > >=Tony Meyer > >_______________________________________________ >Spambayes mailing list >Spambayes@python.org >http://mail.python.org/mailman/listinfo/spambayes > > c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org From mhammond at skippinet.com.au Tue Feb 25 12:01:40 2003 From: mhammond at skippinet.com.au (Mark Hammond) Date: Mon Feb 24 20:03:42 2003 Subject: [Spambayes] SMTPProxy [Was Training] In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F1318D522@its-xchg4.massey.ac.nz> Message-ID: <007401c2dc69$74f31640$530f8490@eden> > > I've been thinking about this a little, as I see no good > > reason why Outlook > > and pop3proxy could not share the same database on Windows. > > I think this would be a very good thing. Easy migration to & > from the plugin, easy sharing your database between (for > example) work with Outlook and home with pop3proxy. And for people who run 2 mailers - eg, I run Outlook for my real mail, but use Mozilla for a free account. However, I really do this just to play with Mozilla mail, and I can't imagine too many people using multiple clients. > Have you started working on this, or is it all ideas at the moment? Just ideas, and a little looking over the bsddb docs. I found: * bsddb fully suports multiple databases in one file. I believe that dbmstorage.py should default to a named database in the file, thereby leaving other named databases in the same file for other applications. As per that mail Tim mentioned, I also believe moving some kind of training memory into the core storage code would be a good thing. * bsddb appears to fully support concurrent access. Locking is supported, but may not be necessary. If we assume that all "incremental training", and scoring functions can happen without any locks, the very worst thing that can happen is that during the scoring of a mail, the counts for the words in that mail have changed. I don't believe we are that sensitive - if 'nspam' or 'nham' for one or 2 words is "off by one" from what the score would have been had locking been in place, I doubt there will be any effect. Full retrains may require thought as there is a large window of time in which the database is useless. * We use a bsddb hash, which will make cursors "unreliable" when not locked. By "unreliable", it means valid keys may be missed, but corrupt data will never be returned. Again, I don't see this as a problem, as the only place we iterate over the database is in test or "support" code. We could lock these regions if necessary, but I doubt production code ever hits it. * I guess we would have to perform a database sync after each train. My experience shows that this is fairly cheap, and given that a train operation always is in response to a user action, a little bit of perf lost here is OK (as opposed to a perf hit for scoring, which is more critical IMO) Mark. -------------- next part -------------- A non-text attachment was scrubbed... Name: winmail.dat Type: application/ms-tnef Size: 2916 bytes Desc: not available Url : http://mail.python.org/pipermail/spambayes/attachments/20030225/d43d4640/winmail.bin From mhammond at skippinet.com.au Tue Feb 25 12:06:49 2003 From: mhammond at skippinet.com.au (Mark Hammond) Date: Mon Feb 24 20:08:24 2003 Subject: [Spambayes] SMTPProxy [Was Training] In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F1318CD6D@its-xchg4.massey.ac.nz> Message-ID: <007801c2dc6a$2d767fe0$530f8490@eden> > [asyncore vs threads] > > I'll have to leave this one to Richie... I wondered the same > > thing, but he > > assured me that there are valid reasons to use asyncore over > > threads... > > smtpproxy is only using threads at the moment because it was > a 30 second solution to using the same database without > looking at the asyncore stuff, or modifying pop3proxy too > much. I have no particular attachment to the threads :) I do . They are *much* simpler than an asyncore model. A "thread-per-connection" model scales incredibly poorly, while an asyncore approach should be able to handle a huge number of connections. Thus, in many situtations, the simplicity of threads is outweighed by the scalability of an async model. I guess it gets down to crystal ball gazing. If we see pop3proxy running for many users, then threads are a poor choice. If it will always remain a "personal" server running on the localhost, you may find threads work just fine while leaving the code that bit more maintainable in the longer term. Mark. -------------- next part -------------- A non-text attachment was scrubbed... Name: winmail.dat Type: application/ms-tnef Size: 2152 bytes Desc: not available Url : http://mail.python.org/pipermail/spambayes/attachments/20030225/6146ef4b/winmail.bin From skip at pobox.com Mon Feb 24 19:28:44 2003 From: skip at pobox.com (Skip Montanaro) Date: Mon Feb 24 20:29:22 2003 Subject: [Spambayes] some preliminary timings In-Reply-To: <20030225000144.GA13259@glacier.arctrix.com> References: <15962.40531.704297.626336@montanaro.dyndns.org> <20030225000144.GA13259@glacier.arctrix.com> Message-ID: <15962.50892.168765.309671@montanaro.dyndns.org> >> Such a huge difference between hammiefilter and a raw filter loop >> suggests I may have done something wrong. Still, perhaps opening the >> db file for each message and all the imports hammiefilter has to do >> simply kills the performance. Neil> Yes. Try "strace python hammiefilter.py". I count 848 open() Neil> system calls. 702 of them return ENOENT. A (relatively) small Neil> sample: ... Making a simple pass through sys.path deleting non-existent directories doesn't help in my case (path length decreased from six directories to five). Hmmm... It would be kind of interesting to override __import__ to look up modules in a saved dictionary. At program exit: locations = {} for m in sys.modules: if hasattr(sys.modules[m], "__file__"): f = sys.modules[m].__file__ if f.endswith(".pyc"): locations[m] = f import cPickle cPickle.dump(locations, open("hf.pickle", "w")) then at program startup: if os.path.exists("hf.pickle"): import cPickle, marshal _locations = cPickle.load(open("hf.pickle")) def hf_import(name, globals=None, locals=None, fromlist=None, locations=_locations, impt=__import__): if name in locations and name not in sys.modules: # we know where to find this module already ... magic here ... return mod return impt(name, globals, locals, fromlist) import __builtin__ __builtin__.__import__ = hf_import I fiddled around a bit but couldn't come up with the "... magic here ..." part. Skip From tshumway at jdiworks.net Mon Feb 24 18:03:09 2003 From: tshumway at jdiworks.net (Terrel Shumway) Date: Mon Feb 24 21:00:05 2003 Subject: [Spambayes] SMTPProxy [Was Training] In-Reply-To: <007801c2dc6a$2d767fe0$530f8490@eden> References: <007801c2dc6a$2d767fe0$530f8490@eden> Message-ID: <200302241803.09051.tshumway@jdiworks.net> On Monday 24 February 2003 17:06, Mark Hammond wrote: > > [asyncore vs threads] > I guess it gets down to crystal ball gazing. If we see pop3proxy running > for many users, then threads are a poor choice. If it will always remain a > "personal" server running on the localhost, you may find threads work just > fine while leaving the code that bit more maintainable in the longer term. Having worked with the async model a lot recently, I don't think that it is that difficult to code. Breaking your code up into bite sized pieces also makes it easier to test (each piece). RE: crystal ball gazing. I am actively working now to make spambayes applicable in a pop-toaster environment with at least 1000 users per server. No crystal ball necessary. (I am also highly motivated, because if it doesn't work, I need to start looking for a real job soon. And I *don't* want to do that.) IMAP is another reason to go async. The protocol seems designed for the reactor pattern: pipelining, untagged server notifications etc. Also, "native" support for folders in IMAP makes a lot of sense for spambayes, both on the client side and on the server side. -- Terrel From barry at python.org Mon Feb 24 21:05:00 2003 From: barry at python.org (Barry A. Warsaw) Date: Mon Feb 24 21:06:00 2003 Subject: [Spambayes] SMTPProxy [Was Training] References: <003e01c2dc4e$9b551740$530f8490@eden> Message-ID: <15962.53068.15809.541055@gargle.gargle.HOWL> >>>>> "MH" == Mark Hammond writes: MH> * Is there some reason you believe the new bsddb can not be MH> reliably used by multiple processes? The only time I can see MH> a problem is when one of the processes is doing a full MH> retrain. Even if one process was "untraining" while another MH> was training, I don't see a real problem. I've been prototyping some Mailman code that backs the member database to a bsddb database (either bsddb3 for Python 2.x or bsddb for Python 2.3). The trickiest part for me was getting things to play nice with multiple processes, especially in the environment initialization code. What ended up working for me was to "join" an existing environment if the directory existed, otherwise create it the first time. I had too many deadlock problems when always creating the database even when db_deadlock gave no indication of problems. Once I solved that, it /appears/ to be reliable. That, and my experience with the BerkeleyDB based ZODB storages (which have different locking constraints and no multiprocess access), leads me to feel pretty confident about bsddb. -Barry From barry at python.org Mon Feb 24 21:07:04 2003 From: barry at python.org (Barry A. Warsaw) Date: Mon Feb 24 21:08:04 2003 Subject: [Spambayes] SMTPProxy [Was Training] References: <003e01c2dc4e$9b551740$530f8490@eden> <2YIGCB7ZU43D92VYT54PMLJ74DADB2Y.3e5a9485@myst> Message-ID: <15962.53192.769272.262922@gargle.gargle.HOWL> >>>>> "TS" == Tim Stone writes: TS> I could be totally confused about this. I just get a bit iffy TS> when files are being shared/updated by sevaral processes, TS> without locking, transaction control, etc. etc. You definitely want locking, unless you have application level locks you can trust. Transactions are probably a good thing, but if they seem like overkill, you might investigate BDB's concurrent database stuff (I haven't played with it). -Barry From nas at python.ca Mon Feb 24 18:39:31 2003 From: nas at python.ca (Neil Schemenauer) Date: Mon Feb 24 21:30:55 2003 Subject: [Spambayes] some preliminary timings In-Reply-To: <15962.50892.168765.309671@montanaro.dyndns.org> References: <15962.40531.704297.626336@montanaro.dyndns.org> <20030225000144.GA13259@glacier.arctrix.com> <15962.50892.168765.309671@montanaro.dyndns.org> Message-ID: <20030225023930.GB13860@glacier.arctrix.com> Skip Montanaro wrote: > Making a simple pass through sys.path deleting non-existent directories > doesn't help in my case (path length decreased from six directories to > five). If you have 2.3 you try putting everything in a ZIP file and make that the only item in the path. I couldn't get it working after a few minutes of noodling to I gave up. :-) > Hmmm... It would be kind of interesting to override __import__ to look up > modules in a saved dictionary. I think effbot's squeeze might work. It's been a while since I looked at it though so maybe that's not how it works. A crazier idea is to use the ELF dumping hack (unexec) that emacs uses a dump the whole Python image after doing all the imports. There's a copy in the Goo sources. Neil From tim.one at comcast.net Mon Feb 24 21:48:18 2003 From: tim.one at comcast.net (Tim Peters) Date: Mon Feb 24 21:49:46 2003 Subject: [Spambayes] Has anyone benchmarked performance? In-Reply-To: <15962.32213.187634.284763@montanaro.dyndns.org> Message-ID: [Skip Montanaro] > I'm curious to know if anyone had done any comparative performance > tests. Not directly. Last time I measured spambayes, it scored 80 msgs/second, wall-clock time, across very large test runs, on an 866MHz P3. That included file I/O, tokenization, scoring, and all the hair the test framework imposed to compute statistics and print out oddball msgs, etc. This was using an in-memory dict; I'd be surprised if spambayes weren't I/O-bound when using a disk-based database. > Here at Northwestern, the folks who administer the mail servers have > done some experimenting with SpamAssassin (versions and configuration > unknown to me) and have, at times, been disappointed in its performance. Disappointment depends on expectation . The computations SB does are very simple compared to those SA does. Note that SB is trying to solve an easier problem, though (separating an individual's notions of ham from spam, not trying to identify spam in an absolute sense). From anthony at interlink.com.au Tue Feb 25 13:54:16 2003 From: anthony at interlink.com.au (Anthony Baxter) Date: Mon Feb 24 21:57:18 2003 Subject: [Spambayes] Filtering unusual words In-Reply-To: <3E59DBEF.2030204@ix.de> Message-ID: <200302250254.h1P2sGv17395@localhost.localdomain> >>> Bert Ungerer wrote > Dear Spambayes developers: > > I read the interesting articles in the Linux Journal. If I understood it > correctly filtering and training of unusual words is critical. > > Most of spam that I receive contains several unique artificial words. > How do you plan to deal with that kind of spam? Unique words are generally referred to as "hapaxes" (see the glossary at http://spambayes.sourceforge.net/docs.html). These are going to be ignored when the message gets scored - but this will make little or no difference overall. There's _so_ many other clues in a typical spam message that it doesn't matter. There's also the aside that a bunch of these "random words" are actually not that random. I have a quite strong spam clue in my training data that's my email address, base64'd. It occurs in a wide variety of spam, as tracking data. Anthony -- Anthony Baxter It's never too late to have a happy childhood. From anthony at interlink.com.au Tue Feb 25 13:58:10 2003 From: anthony at interlink.com.au (Anthony Baxter) Date: Mon Feb 24 22:01:07 2003 Subject: [Spambayes] some preliminary timings In-Reply-To: <15962.40531.704297.626336@montanaro.dyndns.org> Message-ID: <200302250258.h1P2wA417450@localhost.localdomain> >>> Skip Montanaro wrote > % for f in msg?? ; do > > time hammiefilter.py -d ~/hammie.db < $f > /dev/null > > done How long does it take to do for f in 1 2 3 4 5 6 7 8 9 10 ; do time python -c "import spambayes" > /dev/null done -- Anthony Baxter It's never too late to have a happy childhood. From anthony at interlink.com.au Tue Feb 25 13:59:39 2003 From: anthony at interlink.com.au (Anthony Baxter) Date: Mon Feb 24 22:02:08 2003 Subject: [Spambayes] SMTPProxy [Was Training] In-Reply-To: <200302241803.09051.tshumway@jdiworks.net> Message-ID: <200302250259.h1P2xd517470@localhost.localdomain> >>> Terrel Shumway wrote > IMAP is another reason to go async. The protocol seems designed for the > reactor pattern: pipelining, untagged server notifications etc. Also, > "native" support for folders in IMAP makes a lot of sense for spambayes, both > on the client side and on the server side. Note that the only IMAP library I'm aware of for python is not set up in any way to work in an async environment, though. From skip at pobox.com Mon Feb 24 21:21:28 2003 From: skip at pobox.com (Skip Montanaro) Date: Mon Feb 24 22:22:10 2003 Subject: [Spambayes] some preliminary timings In-Reply-To: <200302250258.h1P2wA417450@localhost.localdomain> References: <15962.40531.704297.626336@montanaro.dyndns.org> <200302250258.h1P2wA417450@localhost.localdomain> Message-ID: <15962.57656.720693.505520@montanaro.dyndns.org> Anthony> How long does it take to do Anthony> for f in 1 2 3 4 5 6 7 8 9 10 ; do Anthony> time python -c "import spambayes" > /dev/null Anthony> done Wall-clock averaged 0.12 seconds. user+sys averaged 0.08 seconds. I guess that would represent a lower bound (import most of the necessary machinery but don't mess with databases or process real files). Skip From skip at pobox.com Mon Feb 24 21:53:24 2003 From: skip at pobox.com (Skip Montanaro) Date: Mon Feb 24 22:54:01 2003 Subject: [Spambayes] some preliminary timings In-Reply-To: <15962.57656.720693.505520@montanaro.dyndns.org> References: <15962.40531.704297.626336@montanaro.dyndns.org> <200302250258.h1P2wA417450@localhost.localdomain> <15962.57656.720693.505520@montanaro.dyndns.org> Message-ID: <15962.59572.830415.117330@montanaro.dyndns.org> Dumping all the Python code into a zip file didn't help either. Using the locations of the various Python modules saved at exit in an earlier run, I dumped all of them into a zip file: >>> z = zipfile.PyZipFile("hf.zip", mode="w") >>> for key in loc: ... z.writepy(loc[key][:-1]) ... >>> z.close() Adding hf.zip to PYTHONPATH I could see stuff getting loaded from there: % PYTHONPATH=`pwd`/hf.zip /usr/bin/time python -v hammiefilter.pyc -d ~/hammie.db < msg01 > /dev/null # installing zipimport hook import zipimport # builtin # installed zipimport hook # zipimport: found 56 names in /Users/skip/src/spambayes/hf.zip import posix # builtin import stat # loaded from Zip /Users/skip/src/spambayes/hf.zip/stat.pyo import posixpath # loaded from Zip /Users/skip/src/spambayes/hf.zip/posixpath.pyo import UserDict # loaded from Zip /Users/skip/src/spambayes/hf.zip/UserDict.pyo ... yet the performance was no better: % /usr/bin/time python hammiefilter.pyc -d ~/hammie.db < msg01 > /dev/null 0.41 real 0.26 user 0.14 sys % PYTHONPATH=`pwd`/hf.zip /usr/bin/time python hammiefilter.pyc -d ~/hammie.db < msg01 > /dev/null 0.44 real 0.27 user 0.10 sys I then tried ktrace (Mac OS X's equivalent to strace(1)). Executing ktrace python hammiefilter.pyc -d ~/hammie.db < msg01 > /dev/null yielded some interesting results. If I search the output for "errno 2 No such file or directory" I get 1056 hits, 226 of which are for attempts to open files in the nonexistent file /Users/skip/local/lib/python23.zip. That seems to be some side effect of the new zip importer stuff. If I then run with PYTHONPATH referencing my stash of .pyo files in hf.zip I see 832 "no such file" responses, and only 86 occurrences of the nonexistent python23.zip. Creating an empty /Users/skip/local/lib/python2.3/ sitecustomize.py file brought the "no such file" lines down to 805. Another thing which might be useful is to change the order in which Python tries module file extensions. Since most modules are written in Python, fewer failed stat() calls would be made if files ending in ".py" were considered before files ending in ".so" and "module.so". That's outside the realm of spambayes though. Skip From skip at pobox.com Mon Feb 24 22:34:41 2003 From: skip at pobox.com (Skip Montanaro) Date: Mon Feb 24 23:35:15 2003 Subject: [Spambayes] some preliminary timings In-Reply-To: <15962.57656.720693.505520@montanaro.dyndns.org> References: <15962.40531.704297.626336@montanaro.dyndns.org> <200302250258.h1P2wA417450@localhost.localdomain> <15962.57656.720693.505520@montanaro.dyndns.org> Message-ID: <15962.62049.646522.20765@montanaro.dyndns.org> More whacks at the failed stat() problem... Switching the search order dropped the number of failed stat() calls from 805 to 710. Removing the current directory (I was in my spambayes directory when running these tests) reduced it further to 610. Removing my lib-tk directory took it to 545. Somewhere along the way, the number of searches for the nonexistent python23.zip file crept up from 86 to 91. How can I get it to stop looking for python23.zip? Creating it and stuffing the standard modules into it doesn't help, since I already have everything I need in hf.zip. Skip From tim_one at email.msn.com Mon Feb 24 23:48:13 2003 From: tim_one at email.msn.com (Tim Peters) Date: Mon Feb 24 23:49:43 2003 Subject: [Spambayes] some preliminary timings In-Reply-To: <15962.62049.646522.20765@montanaro.dyndns.org> Message-ID: [Skip Montanaro, trying to cut futile import attempts via .zip files] I expect we'd have better luck outwitting new-in-2.3 internals on the Python-Dev list, yes? From anthony at interlink.com.au Tue Feb 25 17:27:24 2003 From: anthony at interlink.com.au (Anthony Baxter) Date: Tue Feb 25 01:30:24 2003 Subject: [Spambayes] some preliminary timings In-Reply-To: Message-ID: <200302250627.h1P6RPQ20171@localhost.localdomain> >>> "Tim Peters" wrote > [Skip Montanaro, trying to cut futile import attempts via .zip files] > > I expect we'd have better luck outwitting new-in-2.3 internals on the > Python-Dev list, yes? Bah. That's just what they _want_ you to think. From whisper at oz.net Tue Feb 25 00:39:12 2003 From: whisper at oz.net (David LeBlanc) Date: Tue Feb 25 03:39:43 2003 Subject: [Spambayes] Spontaneous loss of db again Message-ID: Rebooted system and OL plugin lost the training db again, but not before running a few "mail runs" ok, so I don't know specifically when/what caused it to lose it. Any ideas on what causes this? David LeBlanc Seattle, WA USA From mhammond at skippinet.com.au Tue Feb 25 22:59:02 2003 From: mhammond at skippinet.com.au (Mark Hammond) Date: Tue Feb 25 07:00:08 2003 Subject: [Spambayes] FW: thank you! Message-ID: <000201c2dcc5$4aa91c20$530f8490@eden> Forwarded with permission - the credit must be shared :) Maybe we need a "testimonials" page <0.1 wink> Mark. -----Original Message----- From: John Klassa (klassa) [mailto:klassa@cisco.com] Sent: Sunday, 23 February 2003 12:59 AM To: Mark Hammond Subject: thank you! Thank you thank you thank you for putting together the Outlook add-in for SpamBayes! I've been trying to figure out what to do about spam ever since switching to Outlook. I used to use fetchmail and procmail, with SpamAssassin, to get my mail into MH folders, which I happily read with exmh. Due to the increasing number of HTML and RichText messages I was receiving at work (along with the fact that my boss specifically requested that I switch), I switched to Outlook. I've been in spam hell ever since. We use IMAP, and I've never found a good way to get rid of spam in an IMAP environment. Whenever I'd use an external process to filter the mail, Outlook would complain that it'd dropped a connection to my mail server. It sucked. Anyway, now, life is good again. Mail comes in, SpamBayes kills the spam and the day is much better. J Thank you thank you thank you! -- John Klassa / klassa@cisco.com There are 10 kinds of people in the world: those who understand binary and those who don't. From skip at pobox.com Tue Feb 25 07:21:10 2003 From: skip at pobox.com (Skip Montanaro) Date: Tue Feb 25 08:21:21 2003 Subject: [Spambayes] some preliminary timings In-Reply-To: References: <15962.62049.646522.20765@montanaro.dyndns.org> Message-ID: <15963.28102.214040.802210@montanaro.dyndns.org> Tim> [Skip Montanaro, trying to cut futile import attempts via .zip Tim> files] Tim> I expect we'd have better luck outwitting new-in-2.3 internals on Tim> the Python-Dev list, yes? Yeah, but the original context was trying to speed up hammiefilter. I just sort of wandered across the line into the more general topic. Skip From barry at python.org Tue Feb 25 08:37:35 2003 From: barry at python.org (Barry A. Warsaw) Date: Tue Feb 25 08:38:00 2003 Subject: [Spambayes] SMTPProxy [Was Training] References: <200302241803.09051.tshumway@jdiworks.net> <200302250259.h1P2xd517470@localhost.localdomain> Message-ID: <15963.29087.412526.418404@gargle.gargle.HOWL> >>>>> "AB" == Anthony Baxter writes: AB> Note that the only IMAP library I'm aware of for python is not AB> set up in any way to work in an async environment, though. Could it be made to work in a Twisted environment? -Barry From skip at pobox.com Tue Feb 25 09:59:45 2003 From: skip at pobox.com (Skip Montanaro) Date: Tue Feb 25 10:59:56 2003 Subject: [Spambayes] Re: some preliminary timings Message-ID: <15963.37617.384031.567676@montanaro.dyndns.org> Executive summary for python-dev folks seeing this for the first time: This thread started at http://mail.python.org/pipermail/spambayes/2003-February/003520.html Running in a single interpreter loop, I can score roughly 46 messages per second. Running from the shell using hammiefilter.py (which takes a msg on stdin and spits a scored message to stdout) performance drops to roughly 2 messages per second. Neil Schmenenauer noted all the failed open() calls during import lookup, which got me started trying to whittle them down. Two more things to try before abandoning this quixotic adventure... It appears $prefix/python23.zip is left in sys.path even if it doesn't exist (Just van Rossum explained to me in a bug report I filed that nonexistent directories might actually be URLs or other weird hacks which import hooks could make use of), so I went with the flow and created it, populating it with the contents of $prefix/python2.3. My averate wallclock time went from 0.5 seconds to 0.47 seconds and user+sys times went from 0.43 seconds to 0.41 seconds. A modest improvement. One more little tweak. I moved the lib-dynload directory to the front of sys.path (obviously only safe if nothing there appears earlier in sys.path). Wall clock average stayed at 0.47 seconds and user+sys at 0.41 seconds, though the total number of system calls as measured by ktrace went from 3454 to 3042. Hammiefilter itself really does very little. Looking at the last ktrace/kdump output, I see 3042 system calls. The hammie.db file isn't opened until line 2717. All the rest before that is startup stuff, the largest chunk of which are nami operations (731) and open (557) calls, most of them involving nonexistent files (as evidenced by seeing only 164 calls to close()). In contrast, only 278 system calls appear to be directly related to manipulating the hammie database. This is still somewhat off-topic for this list (except for the fact that my intention was to get hammiefilter to run faster), so I'll cc python-dev to keep Tim happy, and perhaps mildly irritate Guido by discussing specific apps on python-dev. Skip From francois.granger at free.fr Tue Feb 25 17:11:30 2003 From: francois.granger at free.fr (Fran=?ISO-8859-1?B?5w==?=ois Granger) Date: Tue Feb 25 11:11:35 2003 Subject: [Spambayes] FW: thank you! In-Reply-To: <000201c2dcc5$4aa91c20$530f8490@eden> Message-ID: on 25/02/03 12:59, Mark Hammond at mhammond@skippinet.com.au wrote: > Forwarded with permission - the credit must be shared :) > > Maybe we need a "testimonials" page <0.1 wink> The message enclosed is a good one to start it. -- Le courrier est un moyen de communication. Les gens devraient se poser des questions sur les implications politiques des choix (ou non choix) de leurs outils et technologies. Pour des courriers propres : -- From rruth at computer.org Tue Feb 25 08:11:23 2003 From: rruth at computer.org (rruth@computer.org) Date: Tue Feb 25 11:12:02 2003 Subject: [Spambayes] Message Causes Spambayes to Crash Message-ID: <200302251611.h1PGBN4q003187@runner.pacbell.net> The main MySQL e-mail list (digest version) always causes Spambayes to crash. Although it looks like the fault is in Generator.py. As soon as I delete the MySQL digest message, Spambayes runs normally. With a MySQL digest message in my inbox (and not deleted): /usr/bin/python $HOME/spambayes/mboxtrain.py -d $HOME/.hammiedb -g $HOME/Mail/NewSoftware -g $HOME/Mail/ham-train -s $HOME/Mail/spam -g $HOME/Mail/inbox Training ham (/home/richard/Mail/NewSoftware): Reading as MH mailbox Trained 26 out of 26 messages Training ham (/home/richard/Mail/ham-train): Reading as MH mailbox Trained 0 out of 0 messages Training ham (/home/richard/Mail/inbox): Reading as MH mailbox Traceback (most recent call last): File "/home/richard/spambayes/mboxtrain.py", line 278, in ? main() File "/home/richard/spambayes/mboxtrain.py", line 265, in main train(h, g, False, force) File "/home/richard/spambayes/mboxtrain.py", line 207, in train mhdir_train(h, path, is_spam, force) File "/home/richard/spambayes/mboxtrain.py", line 190, in mhdir_train f.write(msg.as_string()) File "/usr/lib/python2.2/site-packages/email/Message.py", line 107, in as_string g.flatten(self, unixfrom=unixfrom) File "/usr/lib/python2.2/site-packages/email/Generator.py", line 100, in flatten self._write(msg) File "/usr/lib/python2.2/site-packages/email/Generator.py", line 128, in _write self._dispatch(msg) File "/usr/lib/python2.2/site-packages/email/Generator.py", line 154, in _dispatch meth(msg) File "/usr/lib/python2.2/site-packages/email/Generator.py", line 243, in _handle_multipart g.flatten(part, unixfrom=False) File "/usr/lib/python2.2/site-packages/email/Generator.py", line 100, in flatten self._write(msg) File "/usr/lib/python2.2/site-packages/email/Generator.py", line 128, in _write self._dispatch(msg) File "/usr/lib/python2.2/site-packages/email/Generator.py", line 154, in _dispatch meth(msg) File "/usr/lib/python2.2/site-packages/email/Generator.py", line 212, in _handle_text raise TypeError, 'string payload expected: %s' % type(payload) TypeError: string payload expected: And after I delete the MySQL digest message: /usr/bin/python $HOME/spambayes/mboxtrain.py -d $HOME/.hammiedb -g $HOME/Mail/NewSoftware -g $HOME/Mail/ham-train -s $HOME/Mail/spam -g $HOME/Mail/inbox Training ham (/home/richard/Mail/NewSoftware): Reading as MH mailbox Trained 26 out of 26 messages Training ham (/home/richard/Mail/ham-train): Reading as MH mailbox Trained 0 out of 0 messages Training ham (/home/richard/Mail/inbox): Reading as MH mailbox Trained 21 out of 21 messages Training spam (/home/richard/Mail/spam): Reading as MH mailbox Trained 12 out of 12 messages Any idea on how to fix this problem with MySQL list digest messages? Richard rruth@computer.org From rruth at computer.org Tue Feb 25 08:18:15 2003 From: rruth at computer.org (rruth@computer.org) Date: Tue Feb 25 11:18:51 2003 Subject: [Spambayes] Processing 'deleted' messages Message-ID: <200302251618.h1PGIFoB003209@runner.pacbell.net> Is there an option in mboxtrain.py to have Spambayes process 'deleted' messages. In my case these message files start with a comma (ex: ,118). This would be great for recreating the database. Richard rruth@computer.org From skip at pobox.com Tue Feb 25 10:23:26 2003 From: skip at pobox.com (Skip Montanaro) Date: Tue Feb 25 11:24:24 2003 Subject: [Spambayes] Message Causes Spambayes to Crash In-Reply-To: <200302251611.h1PGBN4q003187@runner.pacbell.net> References: <200302251611.h1PGBN4q003187@runner.pacbell.net> Message-ID: <15963.39038.645077.328697@montanaro.dyndns.org> Richard> The main MySQL e-mail list (digest version) always causes Richard> Spambayes to crash. ... Richard> Any idea on how to fix this problem with MySQL list digest Richard> messages? Please file a bug report and attach a sample digest message and traceback here: http://sf.net/tracker/?group_id=61702&atid=498103 Thanks, Skip From guido at python.org Tue Feb 25 12:18:09 2003 From: guido at python.org (Guido van Rossum) Date: Tue Feb 25 12:23:28 2003 Subject: [Spambayes] Re: [Python-Dev] Re: some preliminary timings In-Reply-To: Your message of "Tue, 25 Feb 2003 09:59:45 CST." <15963.37617.384031.567676@montanaro.dyndns.org> References: <15963.37617.384031.567676@montanaro.dyndns.org> Message-ID: <200302251718.h1PHIBe07657@odiug.zope.com> > Executive summary for python-dev folks seeing this for the first time: > > This thread started at > > http://mail.python.org/pipermail/spambayes/2003-February/003520.html > > Running in a single interpreter loop, I can score roughly 46 > messages per second. Running from the shell using hammiefilter.py > (which takes a msg on stdin and spits a scored message to stdout) > performance drops to roughly 2 messages per second. Neil > Schmenenauer noted all the failed open() calls during import > lookup, which got me started trying to whittle them down. > > Two more things to try before abandoning this quixotic adventure... > > It appears $prefix/python23.zip is left in sys.path even if it doesn't exist > (Just van Rossum explained to me in a bug report I filed that nonexistent > directories might actually be URLs or other weird hacks which import hooks > could make use of), so I went with the flow and created it, populating it > with the contents of $prefix/python2.3. My averate wallclock time went from > 0.5 seconds to 0.47 seconds and user+sys times went from 0.43 seconds to > 0.41 seconds. A modest improvement. > > One more little tweak. I moved the lib-dynload directory to the front of > sys.path (obviously only safe if nothing there appears earlier in sys.path). > Wall clock average stayed at 0.47 seconds and user+sys at 0.41 seconds, > though the total number of system calls as measured by ktrace went from 3454 > to 3042. > > Hammiefilter itself really does very little. Looking at the last > ktrace/kdump output, I see 3042 system calls. The hammie.db file isn't > opened until line 2717. All the rest before that is startup stuff, the > largest chunk of which are nami operations (731) and open (557) calls, most > of them involving nonexistent files (as evidenced by seeing only 164 calls > to close()). In contrast, only 278 system calls appear to be directly > related to manipulating the hammie database. > > This is still somewhat off-topic for this list (except for the fact that my > intention was to get hammiefilter to run faster), so I'll cc python-dev to > keep Tim happy, and perhaps mildly irritate Guido by discussing specific > apps on python-dev. Far from it, I wish spambayes well (and wish I could still be involved) :-). The issue seems to be that a moderately sized application takes a long time to start, right? How much of the user+sys time was user, how much was sys? Have you used python -v to see which modules it imports? Long ago I knew Hammie; I believe it reads a possibly large database. How much time does opening +closing the database take? (I presume that the 46 messages/second was not opening the database afresh for each message.) --Guido van Rossum (home page: http://www.python.org/~guido/) From nas at python.ca Tue Feb 25 09:47:46 2003 From: nas at python.ca (Neil Schemenauer) Date: Tue Feb 25 12:39:15 2003 Subject: [Spambayes] Re: some preliminary timings In-Reply-To: <15963.37617.384031.567676@montanaro.dyndns.org> References: <15963.37617.384031.567676@montanaro.dyndns.org> Message-ID: <20030225174745.GA15650@glacier.arctrix.com> Skip Montanaro wrote: > My averate wallclock time went from 0.5 seconds to 0.47 seconds and > user+sys times went from 0.43 seconds to 0.41 seconds. A modest > improvement. Of course imports are the only part of startup cost. Damn that Amdahl guy. :-) Neil From neale at woozle.org Tue Feb 25 10:40:16 2003 From: neale at woozle.org (Neale Pickett) Date: Tue Feb 25 13:40:43 2003 Subject: [Spambayes] Re: [Python-Dev] Re: some preliminary timings In-Reply-To: <200302251718.h1PHIBe07657@odiug.zope.com> (Guido van Rossum's message of "Tue, 25 Feb 2003 12:18:09 -0500") References: <15963.37617.384031.567676@montanaro.dyndns.org> <200302251718.h1PHIBe07657@odiug.zope.com> Message-ID: Guido van Rossum writes: > The issue seems to be that a moderately sized application takes a long > time to start, right? How much of the user+sys time was user, how > much was sys? Have you used python -v to see which modules it > imports? > > Long ago I knew Hammie; I believe it reads a possibly large database. > How much time does opening +closing the database take? (I presume > that the 46 messages/second was not opening the database afresh for > each message.) Hammie's since been modified to use a Berkeley database (bsddb3), so there's very little penalty associated with the database at startup time AFAICT. The constant pickling and unpickling of objects may incur some penalty, but I don't think it would account for such a drastic slowdown. Experience (and Tim ;) has tought me not to trust intuition, though. I have very little experience performance tuning Python apps thus far, so I need to defer to someone else to devise an adequate test of the speed hit from pickling. Surely someone's considered using the profiler, right? Neale From guido at python.org Tue Feb 25 14:21:33 2003 From: guido at python.org (Guido van Rossum) Date: Tue Feb 25 14:27:29 2003 Subject: [Spambayes] Re: [Python-Dev] Re: some preliminary timings In-Reply-To: Your message of "Tue, 25 Feb 2003 10:40:16 PST." References: <15963.37617.384031.567676@montanaro.dyndns.org> <200302251718.h1PHIBe07657@odiug.zope.com> Message-ID: <200302251921.h1PJLX108316@odiug.zope.com> > > Long ago I knew Hammie; I believe it reads a possibly large database. > > How much time does opening +closing the database take? (I presume > > that the 46 messages/second was not opening the database afresh for > > each message.) > > Hammie's since been modified to use a Berkeley database (bsddb3), so > there's very little penalty associated with the database at startup time > AFAICT. The constant pickling and unpickling of objects may incur some > penalty, but I don't think it would account for such a drastic slowdown. > > Experience (and Tim ;) has tought me not to trust intuition, though. I > have very little experience performance tuning Python apps thus far, so > I need to defer to someone else to devise an adequate test of the speed > hit from pickling. Surely someone's considered using the profiler, > right? Profiler, schmofiler. I use this to time specific operations: t0 = time.clock() t1 = time.clock() print t1-t0 On Unix, clock() measures CPU time. If real time is more important than CPU time, use time.time(). On Windows, clock() is a real time timer that's more precise than time(), so you should always use clock() there. --Guido van Rossum (home page: http://www.python.org/~guido/) From skip at pobox.com Tue Feb 25 15:31:49 2003 From: skip at pobox.com (Skip Montanaro) Date: Tue Feb 25 16:32:25 2003 Subject: [Spambayes] Re: [Python-Dev] Re: some preliminary timings In-Reply-To: <200302251718.h1PHIBe07657@odiug.zope.com> References: <15963.37617.384031.567676@montanaro.dyndns.org> <200302251718.h1PHIBe07657@odiug.zope.com> Message-ID: <15963.57541.244513.169979@montanaro.dyndns.org> Guido> The issue seems to be that a moderately sized application takes a Guido> long time to start, right? How much of the user+sys time was Guido> user, how much was sys? Have you used python -v to see which Guido> modules it imports? Actually, hammmiefilter is a rather small application, so it seems its runtime is completely dominated by startup costs. It reads a single message on stdin, scores it and writes it out with one or two new headers on stdout. (Perhaps the architecture is just wrong and we should be shipping it off to a long-running process for scoring.) About one-third of user+sys is sys time. Guido> Long ago I knew Hammie; I believe it reads a possibly large Guido> database. How much time does opening +closing the database take? Guido> (I presume that the 46 messages/second was not opening the Guido> database afresh for each message.) The database is now a bsddb hash file. It's no longer a pickle. As I indicated, relatively few system calls (slightly less than 10% of the total) seem to be involved in opening and probing the database. It's probably somewhat invalid to equate number of system calls with application runtime. I redumped my last ktrace file just now with timestamps. Here are some computed intervals: interval time -------- ---- start -> open hammiefilter.pyc 0.071 open hammiefilter.pyc -> open hammie.db 0.516 open hammie.db -> close hammie.db 0.084 close hammie.db -> program end 0.011 The first interval is pure system startup - load interpreter executable, link in shared libraries, etc. The second interval is application startup - import modules, execute module-level code, etc. The third interval is where the application actually does useful work. The last interval is application shutdown. While application startup is not exclusively importing modules, from the looks of things it's a fair chunk. Skip From popiel at wolfskeep.com Tue Feb 25 14:10:42 2003 From: popiel at wolfskeep.com (T. Alexander Popiel) Date: Tue Feb 25 17:11:17 2003 Subject: [Spambayes] Re: [Python-Dev] Re: some preliminary timings In-Reply-To: Message from Skip Montanaro <15963.57541.244513.169979@montanaro.dyndns.org> References: <15963.37617.384031.567676@montanaro.dyndns.org> <200302251718.h1PHIBe07657@odiug.zope.com> <15963.57541.244513.169979@montanaro.dyndns.org> Message-ID: <20030225221042.D1D932DDC2@cashew.wolfskeep.com> In message: <15963.57541.244513.169979@montanaro.dyndns.org> Skip Montanaro writes: > >It's probably somewhat invalid to equate number of system calls with >application runtime. I redumped my last ktrace file just now with >timestamps. Here are some computed intervals: > > interval time > -------- ---- > start -> open hammiefilter.pyc 0.071 > open hammiefilter.pyc -> open hammie.db 0.516 > open hammie.db -> close hammie.db 0.084 > close hammie.db -> program end 0.011 This is good info. Can you add in the time intervals between loading each of the modules? That might point out which modules are actually expensive (or if it's none in particular). - Alex From skip at pobox.com Tue Feb 25 16:19:46 2003 From: skip at pobox.com (Skip Montanaro) Date: Tue Feb 25 17:20:27 2003 Subject: [Spambayes] Re: [Python-Dev] Re: some preliminary timings In-Reply-To: <20030225221042.D1D932DDC2@cashew.wolfskeep.com> References: <15963.37617.384031.567676@montanaro.dyndns.org> <200302251718.h1PHIBe07657@odiug.zope.com> <15963.57541.244513.169979@montanaro.dyndns.org> <20030225221042.D1D932DDC2@cashew.wolfskeep.com> Message-ID: <15963.60418.896367.294833@montanaro.dyndns.org> >> interval time >> -------- ---- >> start -> open hammiefilter.pyc 0.071 >> open hammiefilter.pyc -> open hammie.db 0.516 >> open hammie.db -> close hammie.db 0.084 >> close hammie.db -> program end 0.011 Alex> This is good info. Can you add in the time intervals between Alex> loading each of the modules? That might point out which modules Alex> are actually expensive (or if it's none in particular). That would be a bit tedious to do manually for the dozens of modules which are loaded. I'll see what I can come up with though. Skip From skip at pobox.com Tue Feb 25 16:52:01 2003 From: skip at pobox.com (Skip Montanaro) Date: Tue Feb 25 17:52:42 2003 Subject: [Spambayes] Re: [Python-Dev] Re: some preliminary timings In-Reply-To: <20030225221042.D1D932DDC2@cashew.wolfskeep.com> References: <15963.37617.384031.567676@montanaro.dyndns.org> <200302251718.h1PHIBe07657@odiug.zope.com> <15963.57541.244513.169979@montanaro.dyndns.org> <20030225221042.D1D932DDC2@cashew.wolfskeep.com> Message-ID: <15963.62353.100645.550074@montanaro.dyndns.org> Alex> This is good info. Can you add in the time intervals between Alex> loading each of the modules? That might point out which modules Alex> are actually expensive (or if it's none in particular). Okay, here's a bit more information. I instrumented hammiefilter.py with code like marker = 0 import os file("os%d"%marker,"w"); os.unlink("os%d"%marker); marker+=1 import sys file("sys%d"%marker,"w"); os.unlink("sys%d"%marker); marker+=1 import getopt file("getopt%d"%marker,"w"); os.unlink("getopt%d"%marker); marker+=1 from spambayes import hammie, Options, mboxutils file("hammie%d"%marker,"w"); os.unlink("hammie%d"%marker); marker+=1 then scored a single message under ktrace control and dumped the ktrace data with timestamps. (This could just have easily have been done with time.clock() or time.time() calls, but after awhile of staring at ktrace results, this seemed just as easy.) The instrumentation gave me a larger number of smaller intervals with these meanings: interval time start through first import (os) 0.166 import sys < 0.001 import getopt 0.055 import hammie, Options, mboxutils 0.660 (!!!) to start of HammieFilter class defn < 0.001 to start of main() < 0.001 create HammieFilter instance 0.005 parse cmd line options < 0.001 get msg from stdin 0.006 score msg 0.224 write scored msg to stdout 0.002 Focusing on the hammie-related imports, I split that import into three lines, reinstrumented and ran it again. Those individual imports then expanded to import hammie 0.340 import Options < 0.001 import mboxutils < 0.001 (As you can see, the times are only relative (large vs small) and don't seem to be all that reproducible across individual runs.) One more marker insertion pass, this time in hammie.py, yielded these intervals from that file: import mboxutils 0.215 import storage 0.072 import options < 0.001 import tokenize 0.052 define Hammie class < 0.001 define open function < 0.001 It appears something in the mboxutils import is the culprit. I'm about to go home for the day though, so I'll let others pick up from there. Skip From guido at python.org Tue Feb 25 18:14:13 2003 From: guido at python.org (Guido van Rossum) Date: Tue Feb 25 18:14:32 2003 Subject: [Spambayes] Re: [Python-Dev] Re: some preliminary timings In-Reply-To: Your message of "Tue, 25 Feb 2003 16:52:01 CST." <15963.62353.100645.550074@montanaro.dyndns.org> References: <15963.37617.384031.567676@montanaro.dyndns.org> <200302251718.h1PHIBe07657@odiug.zope.com> <15963.57541.244513.169979@montanaro.dyndns.org> <20030225221042.D1D932DDC2@cashew.wolfskeep.com> <15963.62353.100645.550074@montanaro.dyndns.org> Message-ID: <200302252314.h1PNEDS18031@odiug.zope.com> Note that spambayes/mboxutils.py imports email.Message, which effectively imports the entire email package. That's a lot of code (one file per class). --Guido van Rossum (home page: http://www.python.org/~guido/) From popiel at wolfskeep.com Tue Feb 25 16:09:18 2003 From: popiel at wolfskeep.com (T. Alexander Popiel) Date: Tue Feb 25 19:09:53 2003 Subject: [Spambayes] Re: [Python-Dev] Re: some preliminary timings In-Reply-To: Message from Guido van Rossum <200302252314.h1PNEDS18031@odiug.zope.com> References: <15963.37617.384031.567676@montanaro.dyndns.org> <200302251718.h1PHIBe07657@odiug.zope.com> <15963.57541.244513.169979@montanaro.dyndns.org> <20030225221042.D1D932DDC2@cashew.wolfskeep.com> <15963.62353.100645.550074@montanaro.dyndns.org> <200302252314.h1PNEDS18031@odiug.zope.com> Message-ID: <20030226000918.70B6F2DDC2@cashew.wolfskeep.com> In message: <200302252314.h1PNEDS18031@odiug.zope.com> Guido van Rossum writes: >Note that spambayes/mboxutils.py imports email.Message, which >effectively imports the entire email package. That's a lot of code >(one file per class). This highlights another recent question: do we want to stop using the email package in favor of our own (presumably lightweight) message parser? Personally, I don't care if the throughput of hammie is under 10 messages per second... my mail feed isn't that dense (and I'll be very disurbed if it becomes so), and people who do have denser feeds should probably have a daemon process for filtering instead of firing off a new process for each message. - Alex From tim at fourstonesExpressions.com Tue Feb 25 18:19:18 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Tue Feb 25 19:19:54 2003 Subject: [Spambayes] Re: [Python-Dev] Re: some preliminary timings In-Reply-To: <20030226000918.70B6F2DDC2@cashew.wolfskeep.com> Message-ID: 2/25/2003 6:09:18 PM, "T. Alexander Popiel" wrote: >In message: <200302252314.h1PNEDS18031@odiug.zope.com> > Guido van Rossum writes: > >>Note that spambayes/mboxutils.py imports email.Message, which >>effectively imports the entire email package. That's a lot of code >>(one file per class). > >This highlights another recent question: do we want to stop using >the email package in favor of our own (presumably lightweight) >message parser? I'm beginning to definitely be of the opinion that our own parser is preferable. Our parsing requirements are VERY light, and the heavier the parser, the more easily spammers can do something to break it. > >Personally, I don't care if the throughput of hammie is under >10 messages per second... my mail feed isn't that dense (and I'll >be very disurbed if it becomes so), and people who do have denser >feeds should probably have a daemon process for filtering instead >of firing off a new process for each message. > >- Alex > >_______________________________________________ >Spambayes mailing list >Spambayes@python.org >http://mail.python.org/mailman/listinfo/spambayes > > c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org From anthony at interlink.com.au Wed Feb 26 13:29:40 2003 From: anthony at interlink.com.au (Anthony Baxter) Date: Tue Feb 25 21:31:41 2003 Subject: [Spambayes] SMTPProxy [Was Training] In-Reply-To: <15963.29087.412526.418404@gargle.gargle.HOWL> Message-ID: <200302260229.h1Q2Ter30591@localhost.localdomain> >>> Barry A. Warsaw wrote > > >>>>> "AB" == Anthony Baxter writes: > > AB> Note that the only IMAP library I'm aware of for python is not > AB> set up in any way to work in an async environment, though. > > Could it be made to work in a Twisted environment? This is the standard python imaplib I'm talking about, by the way. It might be possible, but I'm not sure how you'd go about it - I suspect that you could do something with just replacing the _command and _command_complete methods, but the imaplib code is pretty funky, so I know I don't want to try it :) -- Anthony Baxter It's never too late to have a happy childhood. From anthony at interlink.com.au Wed Feb 26 13:33:42 2003 From: anthony at interlink.com.au (Anthony Baxter) Date: Tue Feb 25 21:35:41 2003 Subject: [Spambayes] Re: [Python-Dev] Re: some preliminary timings In-Reply-To: <20030226000918.70B6F2DDC2@cashew.wolfskeep.com> Message-ID: <200302260233.h1Q2Xgl30644@localhost.localdomain> >>> "T. Alexander Popiel" wrote > This highlights another recent question: do we want to stop using > the email package in favor of our own (presumably lightweight) > message parser? If performance is an issue, the effort would be better spent on making a long-lived server process to do the message scoring. Anthony -- Anthony Baxter It's never too late to have a happy childhood. From skip at pobox.com Tue Feb 25 20:36:56 2003 From: skip at pobox.com (Skip Montanaro) Date: Tue Feb 25 21:37:10 2003 Subject: [Spambayes] Re: [Python-Dev] Re: some preliminary timings In-Reply-To: <20030226000918.70B6F2DDC2@cashew.wolfskeep.com> References: <15963.37617.384031.567676@montanaro.dyndns.org> <200302251718.h1PHIBe07657@odiug.zope.com> <15963.57541.244513.169979@montanaro.dyndns.org> <20030225221042.D1D932DDC2@cashew.wolfskeep.com> <15963.62353.100645.550074@montanaro.dyndns.org> <200302252314.h1PNEDS18031@odiug.zope.com> <20030226000918.70B6F2DDC2@cashew.wolfskeep.com> Message-ID: <15964.10312.714638.211286@montanaro.dyndns.org> Alex> This highlights another recent question: do we want to stop using Alex> the email package in favor of our own (presumably lightweight) Alex> message parser? It's just a Simple Matter of Programming. ;-) Alex> Personally, I don't care if the throughput of hammie is under 10 Alex> messages per second... my mail feed isn't that dense (and I'll be Alex> very disurbed if it becomes so), and people who do have denser Alex> feeds should probably have a daemon process for filtering instead Alex> of firing off a new process for each message. My original motivation in looking at this is that here at Northwestern University the group I'm in (which manages the four large mail servers) has been asked to look at possible *server side* mechanisms for filtering spam. I think it's mostly politics that client side mechanisms aren't of interest, but in part it's because client support is less than server support in this environment. That being the case, getting scoring time down to where we can process a dozen or two messages per second would be desirable, even if the available cpu power on the thousands of clients far outweighs the cpu power available on the servers, and also ignoring issues of training. Skip From skip at pobox.com Tue Feb 25 20:43:12 2003 From: skip at pobox.com (Skip Montanaro) Date: Tue Feb 25 21:43:20 2003 Subject: [Spambayes] Re: [Python-Dev] Re: some preliminary timings In-Reply-To: <200302260233.h1Q2Xgl30644@localhost.localdomain> References: <20030226000918.70B6F2DDC2@cashew.wolfskeep.com> <200302260233.h1Q2Xgl30644@localhost.localdomain> Message-ID: <15964.10688.956080.785719@montanaro.dyndns.org> >> This highlights another recent question: do we want to stop using the >> email package in favor of our own (presumably lightweight) message >> parser? Anthony> If performance is an issue, the effort would be better spent on Anthony> making a long-lived server process to do the message scoring. I thought that was the hammie{cli,srv}.py pair. Is that code still being maintained? Skip From skip at pobox.com Tue Feb 25 21:35:33 2003 From: skip at pobox.com (Skip Montanaro) Date: Tue Feb 25 22:35:43 2003 Subject: [Spambayes] Re: [Python-Dev] Re: some preliminary timings In-Reply-To: <15964.10688.956080.785719@montanaro.dyndns.org> References: <20030226000918.70B6F2DDC2@cashew.wolfskeep.com> <200302260233.h1Q2Xgl30644@localhost.localdomain> <15964.10688.956080.785719@montanaro.dyndns.org> Message-ID: <15964.13829.411801.395793@montanaro.dyndns.org> Anthony> If performance is an issue, the effort would be better spent on Anthony> making a long-lived server process to do the message scoring. Skip> I thought that was the hammie{cli,srv}.py pair. Is that code Skip> still being maintained? In answer to my question, it appears they are not currently maintained. They required a bit of work to get running and the performance was less than stellar, when compared to hammiefilter.py. Round-trip wallclock times for hammiecli.py look to be about 1 second per message, while total user+sys time for both hammiesrv.py and hammiecli.py to process 10 messages was 3.8 seconds. I suspect much of the overhead could be eliminated by replacing xmlrpc with a simple bytecount/bytes protocol running over a raw socket. Skip From T.A.Meyer at massey.ac.nz Wed Feb 26 16:46:58 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Tue Feb 25 22:47:38 2003 Subject: [Spambayes] SMTPProxy [Was Training] Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1318CD6F@its-xchg4.massey.ac.nz> Re: asyncore vs threads Well, since Mark mostly uses the Outlook plugin anyway, and he was the only one who piped up with an attachment to threads, I reworked the smtpproxy to use asyncore and not threads [sorry Mark ;)]. It does do away with the smtps.py file, which is kinda nice since that code needed a bit of work. Again, this is all very alpha, but if anyone wants to try it out, a working (from my testing, at least) async smtpproxy can be got from: . There are files included: * ui.html and ui_html.py - these just alter the ui to allow the user to find a message by id. * Corpus.py - fixes a minor bug * Options.py - adds the obvious options * pop3proxy.py - changes to allow finding a message, plus stripping incoming ids (an option), plus launching the smtpproxy. * smtpproxy.py - the main code. Comments == good. =Tony Meyer From noreply at sourceforge.net Tue Feb 25 18:44:08 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Tue Feb 25 23:56:01 2003 Subject: [Spambayes] [ spambayes-Bugs-693371 ] Reconfiguring Outlook mail support semi-breaks spambayes Message-ID: Bugs item #693371, was opened at 2003-02-26 13:44 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=693371&group_id=61702 Category: Outlook Group: None Status: Open Resolution: None Priority: 5 Submitted By: Mark Hammond (mhammond) Assigned to: Mark Hammond (mhammond) Summary: Reconfiguring Outlook mail support semi-breaks spambayes Initial Comment: If you "Reconfigure Mail Support" and switch between "corporate" and "internet only" mode (in either direction), the EntryIDs of our folders all change. SpamBayes then starts in an enabled state, but is silently doing nothing. Even if we can't re-locate the folders, we should report failure to start somehow. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=693371&group_id=61702 From noreply at sourceforge.net Tue Feb 25 19:12:20 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Tue Feb 25 23:56:10 2003 Subject: [Spambayes] [ spambayes-Bugs-693371 ] Reconfiguring Outlook mail support semi-breaks spambayes Message-ID: Bugs item #693371, was opened at 2003-02-26 13:44 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=693371&group_id=61702 Category: Outlook Group: None Status: Open Resolution: None Priority: 5 Submitted By: Mark Hammond (mhammond) Assigned to: Mark Hammond (mhammond) Summary: Reconfiguring Outlook mail support semi-breaks spambayes Initial Comment: If you "Reconfigure Mail Support" and switch between "corporate" and "internet only" mode (in either direction), the EntryIDs of our folders all change. SpamBayes then starts in an enabled state, but is silently doing nothing. Even if we can't re-locate the folders, we should report failure to start somehow. ---------------------------------------------------------------------- >Comment By: Mark Hammond (mhammond) Date: 2003-02-26 14:12 Message: Logged In: YES user_id=14198 I should note that all the builtin rules also break in this case, and need the folders re-specified. The builtin rules however display a dialog as the rule fails (ie, as a mail matches the condition). We could maybe take the same approach: * If filtering is enabled when we start, but we can not locate the "watch" folder, silently assume the Inbox. * Watch these messages. As soon as a Spam or Unsure message is received, display the message indicating why we couldn't move it. * The "silent inbox" assumption will then hopefully be noticed by the user as they re-configure the dialogs. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=693371&group_id=61702 From noreply at sourceforge.net Tue Feb 25 19:27:04 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Tue Feb 25 23:56:17 2003 Subject: [Spambayes] [ spambayes-Bugs-693387 ] user-composed messages are filtered Message-ID: Bugs item #693387, was opened at 2003-02-26 14:27 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=693387&group_id=61702 Category: Outlook Group: None Status: Open Resolution: None Priority: 5 Submitted By: Mark Hammond (mhammond) Assigned to: Mark Hammond (mhammond) Summary: user-composed messages are filtered Initial Comment: Messages composed by the user (eg, dragged back from "Drafts") or otherwise ending up there via external programs (I actually saw this with a Quicken generated mail) get filtered. They usually end up as "maybe", but should be ignored. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=693387&group_id=61702 From noreply at sourceforge.net Tue Feb 25 21:03:20 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Tue Feb 25 23:56:26 2003 Subject: [Spambayes] [ spambayes-Bugs-693423 ] email message generates error in pop3proxy.py Message-ID: Bugs item #693423, was opened at 2003-02-26 00:02 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=693423&group_id=61702 Category: None Group: None Status: Open Resolution: None Priority: 5 Submitted By: David Shaw (dshaw) Assigned to: Nobody/Anonymous (nobody) Summary: email message generates error in pop3proxy.py Initial Comment: Hi all, A friend of mine had a cache file in his "unknown" folder that caused the "review" web page in pop3proxy.py to generate the following traceback: Traceback (most recent call last): File "spambayes/Dibbler.py", line 398, in found_terminator getattr(plugin, name)(**params) File "pop3proxy.py", line 929, in onReview judgement = judgement.split(';')[0].strip() File "pop3proxy.py", line 815, in _makeMessageInfo print type(text) AttributeError: 'list' object has no attribute 'replace' He sent me the offending message, and I replicated the problem: msg = open("/Users/dshaw/Desktop/crash_spam.txt", "r") message = mbox.get_message(msg) part = typed_subpart_iterator(message, 'text', 'plain').next() text = part.get_payload() >>> text [] So, instead of text, the payload is a list containing a single email message instance. Here are the objects' respective payloads: >>> message._payload [, , , , , , , , , , , , , ] ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=693423&group_id=61702 From mhammond at skippinet.com.au Wed Feb 26 16:06:00 2003 From: mhammond at skippinet.com.au (Mark Hammond) Date: Wed Feb 26 00:07:08 2003 Subject: [Spambayes] spambayes/msgs.py used? Message-ID: While working out how to tackle my message-database, I bumped into msgs.py. As far as I can see, it is completely unused. The revision history just lists checkins by Anthony Baxter as part of larger checkins, and indicates that the file was created on the reorg-branch branch. No .py file in the project appears to import this module - does anyone know its status and/or its history? Mark. From tim.one at comcast.net Wed Feb 26 00:21:18 2003 From: tim.one at comcast.net (Tim Peters) Date: Wed Feb 26 00:21:48 2003 Subject: [Spambayes] Re: [Python-Dev] Re: some preliminary timings In-Reply-To: Message-ID: [Tim Stone] > ... > I'm beginning to definitely be of the opinion that our own parser is > preferable. Our parsing requirements are VERY light, To the contrary, I can't think of a parsing ability of the Python email pkg that spambayes *doesn't* use, from making sense of arbitrarily nested MIME structure, to identifying the charsets in use. I suspect you're just thinking of body parsing, where we do very little -- but the email pkg does very little there too (beyond magically identifying the text portions for us, and magically decoding base64 and quoted-printable sections). > and the heavier the parser, the more easily spammers can do something > to break it. They're not having much success so far . From tim.one at comcast.net Wed Feb 26 00:26:00 2003 From: tim.one at comcast.net (Tim Peters) Date: Wed Feb 26 00:26:31 2003 Subject: [Spambayes] Re: [Python-Dev] Re: some preliminary timings In-Reply-To: <15964.10312.714638.211286@montanaro.dyndns.org> Message-ID: [Skip Montanaro] > My original motivation in looking at this is that here at Northwestern > University the group I'm in (which manages the four large mail > servers) has been asked to look at possible *server side* mechanisms > for filtering spam. Servers exist to run daemons: run a classifier service, and your speed problems vanish. They surely don't fire up a fresh SpamAssassin for each msg now -- or do they? If so, they hired you to pull their heads out of their butts . From mhammond at skippinet.com.au Wed Feb 26 16:38:19 2003 From: mhammond at skippinet.com.au (Mark Hammond) Date: Wed Feb 26 00:39:07 2003 Subject: [Spambayes] Adding a message database Message-ID: I've been harping on about this for a while, and recently started seeing other people with the same need. If I glanced the checkin message correctly, TimS also wants one for his Notes work. Currently, core spambayes maintains a database of wordinfos. I would like spambayes to assist in managing a database of message_ids, mapped to how they were previously trained. While spambayes does not need any such concept to perform basic scoring, it seems that many applications using spambayes do. I see 2 strategies: 1) sub-class Classifier, adding this capability. This keeps it out of the core, but then makes it hard to use the existing sub-classes - eg, who does "DBStorage" derive from? We will either need to multiply out all the base classes, or use mixins. 2) Add the basic support to classifier, but in a non-intrusive way, allowing it to be left unused by an application. I believe that modifying Classifier to use a "Message object" is too intrusive. Specifically, for (2), I would change learn to: def learn(self, wordstream, is_spam, msg_id = None): ... self._add_msg(wordstream, is_spam) if msg_id is not None: self._add_msgid(msg_id, is_spam) Thus, if msg_id is never passed to learn (as no current application will), the new "_add_msgid()" function will never be called. Similarly for unlearn. _add_msgid() and _remove_msgid() would maintain a new dictionary-like object for the database, but it would initially be set to None, and demand-loaded first time it is actually needed, and saved only when non-None. The storage.py related sub-classes then get to implement a DB behind this. Pickles can save to a discrete file, while bsddb can use a multiple-databases-in-one-file approach. For existing applications, no attempt is ever made to load a "message database". Only when the app starts passing message IDs will the database be used, so this is completely up to the author of the app. It should also not break any existing clients or code. Comments? Mark. From tim at fourstonesExpressions.com Wed Feb 26 06:25:28 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Wed Feb 26 07:25:33 2003 Subject: [Spambayes] Adding a message database In-Reply-To: Message-ID: <64FNJ06SSOQM760D0C9CA75DBB03Z.3e5cb238@myst> 2/25/2003 11:38:19 PM, "Mark Hammond" wrote: >I've been harping on about this for a while, and recently started seeing >other people with the same need. If I glanced the checkin message >correctly, TimS also wants one for his Notes work. Yes, I do maintain a pickle of message ids and how they have been trained. It currently has four possible values: 'never classified', 'classified', 'spam', 'ham'. Spam and Ham values are set when a message is trained as such. Never classified value is set upon first time initialization, due to some quirks in how Notes makes its mail database available to the outside world. All of this enables proper (re)training. > >Currently, core spambayes maintains a database of wordinfos. I would like >spambayes to assist in managing a database of message_ids, mapped to how >they were previously trained. While spambayes does not need any such >concept to perform basic scoring, it seems that many applications using >spambayes do. I think this is a wonderful idea. >2) Add the basic support to classifier, but in a non-intrusive way, allowing >it to be left unused by an application. I believe that modifying Classifier >to use a "Message object" is too intrusive. Better idea than strategy 1, IMO. > >Specifically, for (2), I would change learn to: learn *could* be altered to manage unlearning as well. This removes a headache for a lot of code. Just learn-and-move-on. Something to this effect: > def learn(self, wordstream, is_spam, msg_id = None): >... > if msg_id is not None: trng = self._get_msgid(msg_id) if trng: if trng == 'spam' and is_spam: self.unlearn(wordstream, False) elif trng == 'ham' and not is_spam: self.unlearn(wordstream, True) self._update_msgid(msg_id, is_spam) # for crud purity else: > self._add_msgid(msg_id, is_spam) > self._add_msg(wordstream, is_spam) > >Comments? Let us make it so! c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org From mhammond at skippinet.com.au Wed Feb 26 23:40:01 2003 From: mhammond at skippinet.com.au (Mark Hammond) Date: Wed Feb 26 07:40:42 2003 Subject: [Spambayes] Adding a message database In-Reply-To: <64FNJ06SSOQM760D0C9CA75DBB03Z.3e5cb238@myst> Message-ID: > Yes, I do maintain a pickle of message ids and how they have been > trained. It currently has four possible values: 'never classified', > 'classified', 'spam', 'ham'. Spam and Ham values are set when a > message is trained as such. Never classified value is set upon > first time initialization, due to some quirks in > how Notes makes its mail database available to the outside world. You didn't explain "classified" - how does this differ from "ham" or "spam"? > >Specifically, for (2), I would change learn to: > > learn *could* be altered to manage unlearning as well. This removes a > headache for a lot of code. Just learn-and-move-on. Something to this > effect: This does sound appealing. I have a vague objection I can't qualify, so if no one else can either, I can live with it . Note however that I am proposing a "tri-state" value - "spam", "ham" or None. How your "classified" fits into this may be a problem. Mark. From tim at fourstonesExpressions.com Wed Feb 26 07:16:06 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Wed Feb 26 08:17:35 2003 Subject: [Spambayes] Adding a message database In-Reply-To: Message-ID: <1VIE73XTLK0GFCAZYEIGMIRNNJVTED.3e5cbe16@myst> 2/26/2003 6:40:01 AM, "Mark Hammond" wrote: >> Yes, I do maintain a pickle of message ids and how they have been >> trained. It currently has four possible values: 'never classified', >> 'classified', 'spam', 'ham'. Spam and Ham values are set when a >> message is trained as such. Never classified value is set upon >> first time initialization, due to some quirks in >> how Notes makes its mail database available to the outside world. > >You didn't explain "classified" - how does this differ from "ham" or "spam"? In Notes, I have no way to determine if I've ever looked at a message before. If I don't keep track of messages that have already been classified, I'll end up classifying the entire inbox each time I run the filter. This is managed differently in the other parts of Spambayes, because the presence/absence of X-Spambayes-Classification header carries that information. In notes, I cannot add headers... > >> >Specifically, for (2), I would change learn to: >> >> learn *could* be altered to manage unlearning as well. This removes a >> headache for a lot of code. Just learn-and-move-on. Something to this >> effect: > >This does sound appealing. I have a vague objection I can't qualify, so if >no one else can either, I can live with it . I'm kinda with you on the vague objection, but I see this code springing up all over the place... just seems like a valid/useful thing to do. > >Note however that I am proposing a "tri-state" value - "spam", "ham" or >None. How your "classified" fits into this may be a problem. We can cross this bridge if we get to it... :) > >Mark. > > > c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org From spambayes at rodland.no Wed Feb 26 14:38:53 2003 From: spambayes at rodland.no (Fredrik Rodland) Date: Wed Feb 26 08:39:03 2003 Subject: [Spambayes] porting database In-Reply-To: <1VIE73XTLK0GFCAZYEIGMIRNNJVTED.3e5cbe16@myst> Message-ID: Is it possible to port the gathered data trained on outlook to be used in procmail? the case: I'm using outlook as a MUA for most of my mails, but have a shell account where I do a lot of initial filtering (virus-checks, spam-checks, etc). So i've trained Spambayes from outlook, and everything looks fine. What I want is to port/move these files to my linux-server (on a regular basis) and include the spambayes rules for procmail. PS! I searched goggle for "porting database", without getting any relevant results. Fredrik -- Fredrik R?dland Stocknet Mob : +47 992 19 817 Technical Architect http://www.stocknet.com Fax : +47 910 73 621 From tim at fourstonesExpressions.com Wed Feb 26 07:41:47 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Wed Feb 26 08:41:51 2003 Subject: [Spambayes] porting database In-Reply-To: Message-ID: 2/26/2003 7:38:53 AM, "Fredrik Rodland" wrote: >Is it possible to port the gathered data trained on outlook to be used in >procmail? > >the case: >I'm using outlook as a MUA for most of my mails, but have a shell account >where I do a lot of initial filtering (virus-checks, spam-checks, etc). > >So i've trained Spambayes from outlook, and everything looks fine. > >What I want is to port/move these files to my linux-server (on a regular >basis) and include the spambayes rules for procmail. Use dbExpImp.py to export/import the database. - TimS > >PS! I searched goggle for "porting database", without getting any relevant >results. > >Fredrik > > >-- >Fredrik R?dland Stocknet Mob : +47 992 19 817 >Technical Architect http://www.stocknet.com Fax : +47 910 73 621 > > >_______________________________________________ >Spambayes mailing list >Spambayes@python.org >http://mail.python.org/mailman/listinfo/spambayes > > c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org From tim at fourstonesExpressions.com Wed Feb 26 07:53:05 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Wed Feb 26 08:53:09 2003 Subject: [Spambayes] porting database In-Reply-To: Message-ID: <944WDCQP4ZM72TVQ74PM7197LFZUC.3e5cc6c1@myst> 2/26/2003 7:48:04 AM, "Fredrik Rodland" wrote: >great - thanx. > >Could you also point me to the correct files to import/export to accomplish >want I described? > >I have the following files after training: > >C:\Programfiler\_UTIL\spambayes-1.0a2\Outlook2000 > default_bayes_database.pck default_bayes_database.pck is the only one you need to export/import. In fact, it's the only one that you *can* export/import . - TimS > default_configuration.pck > default_message_database.pck > >F > > c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org From skip at pobox.com Wed Feb 26 07:58:41 2003 From: skip at pobox.com (Skip Montanaro) Date: Wed Feb 26 08:58:43 2003 Subject: [Spambayes] Adding a message database In-Reply-To: References: Message-ID: <15964.51217.18675.935909@montanaro.dyndns.org> Mark> I would like spambayes to assist in managing a database of Mark> message_ids, mapped to how they were previously trained. ... much stuff elided ... I understand what you want to do, but not why. Can you provide some motivation? Thx, Skip From tim at fourstonesExpressions.com Wed Feb 26 08:10:43 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Wed Feb 26 09:10:48 2003 Subject: [Spambayes] Adding a message database In-Reply-To: <15964.51217.18675.935909@montanaro.dyndns.org> Message-ID: <04JE82TOFCF0RP81UNJ86RQYS63YIH.3e5ccae3@myst> 2/26/2003 7:58:41 AM, Skip Montanaro wrote: > Mark> I would like spambayes to assist in managing a database of > Mark> message_ids, mapped to how they were previously trained. > ... much stuff elided ... > >I understand what you want to do, but not why. Can you provide some >motivation? Let me speak from recent experience. I just wrote a Lotus Notes integration, and in the process discovered that Notes provides me with almost no facility to remember ANYTHING about a message. It only provides me with a unique message id. I was thus forced to implement a message database, so I could remember what had/had not happened to that message. If I hadn't done that, I couldn't have properly retrained a message, and would have had to classify every message in the inbox every time I executed the filter. This pattern was somewhat similar to what happens in the pop3proxy. Messages are given an id, and are managed by that id. Fortunately in that case, information can be embedded in headers. ***BUT*** headers may not be a good place in which to store that information. Particularly, how the message was trained is currently remembered by what subdirectory (Corpus) the message lives in. This idea works for pop3proxy, not for Notes, and not for Outlook. In Outlook, how a message was trained is currently remembered by what Outlook folder the message lives in. But if I read the continual posts correctly, this is an ongoing source of aches, pains, and nausea for Mark, who has to handle all of the combinations of user interactions with trained mail to correctly untrain/retrain. Then there's hammiefilter... etc. etc. All of this adds up (in my mind) to a ton of code rolling around in the system simply to manage untrain/retrain which could be nicely abstracted into the learn method provided that information about a message could be persisted. We will undoubtedly encounter more mail systems that could benefit from Spambayes. My Notes work is my most recent example. I feel that we should provide as much facility as we can to make these integrations as easy as possible, and this is one such facility. Thus endeth my apologetic - TimS c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org From noreply at sourceforge.net Wed Feb 26 03:58:43 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Wed Feb 26 10:13:56 2003 Subject: [Spambayes] [ spambayes-Feature Requests-676401 ] Outlook: Storage in default user directory Message-ID: Feature Requests item #676401, was opened at 2003-01-29 09:19 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498106&aid=676401&group_id=61702 Category: None Group: None >Status: Closed Priority: 5 Submitted By: Tony Meyer (anadelonbrin) >Assigned to: Mark Hammond (mhammond) Summary: Outlook: Storage in default user directory Initial Comment: Follows from comments in spambayes list from Piers Haken and Mark Hammond. It would be nice if the plugin stored the pck and ini files in a more appropriate folder than the outlook root folder - as Piers commented, the user might not have write access there. The folder SHGetSpecialFolderPath(0, shellcon.CSIDL_APPDATA) would probably be the best place. The pck's are created by the plugin and so are easy; how the default .ini file gets there is another issue. ---------------------------------------------------------------------- >Comment By: Mark Hammond (mhammond) Date: 2003-02-26 22:58 Message: Logged In: YES user_id=14198 All done :) ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498106&aid=676401&group_id=61702 From noreply at sourceforge.net Wed Feb 26 04:03:13 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Wed Feb 26 10:14:04 2003 Subject: [Spambayes] [ spambayes-Bugs-673388 ] pop3proxy storage Message-ID: Bugs item #673388, was opened at 2003-01-24 09:02 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=673388&group_id=61702 >Category: pop3proxy Group: None Status: Open Resolution: None Priority: 5 Submitted By: François Granger (fgranger) Assigned to: Nobody/Anonymous (nobody) Summary: pop3proxy storage Initial Comment: I had a look in the pop3proxy folders, and I found thes strange files. They miss header and maybe part of the message. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=673388&group_id=61702 From noreply at sourceforge.net Wed Feb 26 04:03:12 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Wed Feb 26 10:14:12 2003 Subject: [Spambayes] [ spambayes-Bugs-673390 ] pop3proxy storage 2nd file Message-ID: Bugs item #673390, was opened at 2003-01-24 09:04 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=673390&group_id=61702 >Category: pop3proxy Group: None Status: Open Resolution: None Priority: 5 Submitted By: François Granger (fgranger) Assigned to: Nobody/Anonymous (nobody) Summary: pop3proxy storage 2nd file Initial Comment: Other file missing header ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=673390&group_id=61702 From noreply at sourceforge.net Wed Feb 26 04:04:00 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Wed Feb 26 10:14:20 2003 Subject: [Spambayes] [ spambayes-Bugs-693423 ] email message generates error in pop3proxy.py Message-ID: Bugs item #693423, was opened at 2003-02-26 16:02 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=693423&group_id=61702 >Category: pop3proxy Group: None Status: Open Resolution: None Priority: 5 Submitted By: David Shaw (dshaw) Assigned to: Nobody/Anonymous (nobody) Summary: email message generates error in pop3proxy.py Initial Comment: Hi all, A friend of mine had a cache file in his "unknown" folder that caused the "review" web page in pop3proxy.py to generate the following traceback: Traceback (most recent call last): File "spambayes/Dibbler.py", line 398, in found_terminator getattr(plugin, name)(**params) File "pop3proxy.py", line 929, in onReview judgement = judgement.split(';')[0].strip() File "pop3proxy.py", line 815, in _makeMessageInfo print type(text) AttributeError: 'list' object has no attribute 'replace' He sent me the offending message, and I replicated the problem: msg = open("/Users/dshaw/Desktop/crash_spam.txt", "r") message = mbox.get_message(msg) part = typed_subpart_iterator(message, 'text', 'plain').next() text = part.get_payload() >>> text [] So, instead of text, the payload is a list containing a single email message instance. Here are the objects' respective payloads: >>> message._payload [, , , , , , , , , , , , , ] ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=693423&group_id=61702 From noreply at sourceforge.net Wed Feb 26 04:04:31 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Wed Feb 26 10:14:27 2003 Subject: [Spambayes] [ spambayes-Feature Requests-680629 ] Outlook plugin: Delete as spam marks as read Message-ID: Feature Requests item #680629, was opened at 2003-02-05 13:30 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498106&aid=680629&group_id=61702 >Category: Outlook Group: None Status: Open Priority: 1 Submitted By: Tony Meyer (anadelonbrin) Assigned to: Mark Hammond (mhammond) Summary: Outlook plugin: Delete as spam marks as read Initial Comment: Personally I think it would be nice if the "delete as spam" button marked the mail item as read. Note that I'm not saying that mail that is filtered as spam should be marked as read - it shouldn't (by default). If others agree, this would be a nice addition. Perhaps as an option in the prefs. ---------------------------------------------------------------------- Comment By: Piers Haken (piersh) Date: 2003-02-07 22:38 Message: Logged In: YES user_id=10551 i don't care if you do this or not (since spambayes catches all my spam ;-) ), but please don't mark any automatically- filtered spam as 'read' - it would be a pain to check for FPs if you did. thx. ---------------------------------------------------------------------- Comment By: Mark Hammond (mhammond) Date: 2003-02-07 22:05 Message: Logged In: YES user_id=14198 Fair enough :) ---------------------------------------------------------------------- Comment By: Paul Moore (pmoore) Date: 2003-02-07 20:09 Message: Logged In: YES user_id=113328 I'd like the "Mark as read" option. Most unsures and false negatives which are spam, I can identify by subject, and hence I don't open (and I don't use the preview pane). But it's not crucial - Ctrl-Q does a very quick "Mark as read" anyway... ---------------------------------------------------------------------- Comment By: Tony Meyer (anadelonbrin) Date: 2003-02-05 14:17 Message: Logged In: YES user_id=552329 Agreed that it is not necessary. ---------------------------------------------------------------------- Comment By: Mark Hammond (mhammond) Date: 2003-02-05 14:11 Message: Logged In: YES user_id=14198 Yep, I see that makring as read could be useful in that they have been reviewed, but then I would expect Outlook's normal mechanism to still work and mark it read. I have my preview pane mark as read after 2 seconds :) Re the INI file - my problem is that the GUI needs to modify these options, and I don't see how it is trivial to keep the fairly "free-form" INI file format supported by configparser, while only writing out certain elements and not others and also keeping comments etc intact. I'll make a deal - help me with the options problem, and I will give you 5 free option . Let's take it to email... ---------------------------------------------------------------------- Comment By: Tony Meyer (anadelonbrin) Date: 2003-02-05 13:56 Message: Logged In: YES user_id=552329 My reasoning was that if the user manually selects to delete it as spam, then it is as good as read. Those that are moving via the filter have not been read. Personally I still wade through the filtered spam to check it for false positives, and mark the messages as read as I go (so that the 'unread' display is the number of messages I haven't checked). If I choose delete as spam, I then have to go to the spam folder and mark it as read. In any case, no big deal if you disagree, it was just a thought :) Re: the ini file: looking at the ini, it doesn't seem to have anything that couldn't be in the GUI. Most of it would probably fit in the "advanced" dialog. It would probably be good if the ini was only for 'beta' options - anything that is for public use should be in the GUI. And if a 'beta' option moves to 'public', then it doesn't matter (much) if it breaks, because those using beta options should be upgrading anyway. Moving the existing settings (most of which should be exposed I think) would mean breaking existing code, but maybe just this once? Maybe this discussion should move to the list? (maybe I should have posted this there originally?) ---------------------------------------------------------------------- Comment By: Mark Hammond (mhammond) Date: 2003-02-05 13:41 Message: Logged In: YES user_id=14198 I'm not too sure this should happen unless the filter also marks the items as read - otherwise you still end up with many spam in the spam folder unread, and only the ones you move manually marked as read. I'm also kinda stuck about what to do with "options". Currently, options managed by the GUI are in a pickle, while other options are in the .ini file. I don't object to having new, outlook specific options in the INI file, but I do object to all our existing code breaking should we decide later to move this option into the GUI. ---------------------------------------------------------------------- Comment By: Tony Meyer (anadelonbrin) Date: 2003-02-05 13:31 Message: Logged In: YES user_id=552329 And who else to decide on this, but Mark :) ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498106&aid=680629&group_id=61702 From noreply at sourceforge.net Wed Feb 26 04:05:31 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Wed Feb 26 10:14:34 2003 Subject: [Spambayes] [ spambayes-Feature Requests-690914 ] Un-classify an email message Message-ID: Feature Requests item #690914, was opened at 2003-02-22 08:44 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498106&aid=690914&group_id=61702 Category: None Group: None >Status: Closed Priority: 5 Submitted By: Carl Nygard (cnygard) Assigned to: Nobody/Anonymous (nobody) Summary: Un-classify an email message Initial Comment: I'm wondering if the math for adding a message classified as ham or spam is reversible. If it is, it would be a nice feature to reverse the equations and "subtract" the effect of a certain email. For example, if I mis-classify a message, it would be nice to un-classify it. I'm guessing that even if the math is close (due to hysteresis effect??) it might be a useful feature. But then, I haven't looked at the code, sorry. ---------------------------------------------------------------------- >Comment By: Mark Hammond (mhammond) Date: 2003-02-26 23:05 Message: Logged In: YES user_id=14198 This is supported now. Currently the core does not record info about how a message has been trained, but assuming the app knows thus, an unlearn() method exist which does undo all the maths. ---------------------------------------------------------------------- Comment By: Tim Stone (timstone4) Date: 2003-02-22 09:39 Message: Logged In: YES user_id=645698 The pop3proxy does not currently support this behavior, though it could in theory. It throws away trained messages at the moment. This may change in the not-so-distant future. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2003-02-22 09:28 Message: Logged In: YES user_id=31435 The math is exactly reversible, and the underlying classifier has both learn() and unlearn() methods. Whether you can get at them easily depends on the client you're using; for example, it's very easy from the project's Outlook client. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498106&aid=690914&group_id=61702 From noreply at sourceforge.net Wed Feb 26 04:05:48 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Wed Feb 26 10:14:44 2003 Subject: [Spambayes] [ spambayes-Feature Requests-690928 ] turn off saving messages in popproxy Message-ID: Feature Requests item #690928, was opened at 2003-02-22 09:00 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498106&aid=690928&group_id=61702 >Category: pop3proxy Group: None Status: Open Priority: 5 Submitted By: Carl Nygard (cnygard) Assigned to: Tim Stone (timstone4) Summary: turn off saving messages in popproxy Initial Comment: It would be nice to be able to turn off saving message for training, and just let the settings chug. I'm guessing that the messages will just pile up if I don't go in and at least discard the messages every day. ---------------------------------------------------------------------- Comment By: Tim Stone (timstone4) Date: 2003-02-22 14:28 Message: Logged In: YES user_id=645698 Messages are auto-deleted after 7 days, by default. This is not well documented, however. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498106&aid=690928&group_id=61702 From jeremy at alum.mit.edu Wed Feb 26 10:26:17 2003 From: jeremy at alum.mit.edu (Jeremy Hylton) Date: Wed Feb 26 10:26:58 2003 Subject: [Spambayes] message lost in the system? Message-ID: <1046273177.29711.14.camel@localhost.localdomain> I've been using a new MUA + spambayes setup for several weeks now. Last night at least two messages got lost in the system, and I'm at a loss to explain how. Luckily, there are copies of them sitting in the pop3proxy's pop3proxy-unknown-cache directory. I'm using Evolution 1.2.1 and have it configured to use pop3proxy for mail coming from my ISP's pop server. I read a few messages using the ISP's web interface and was surprised to discover that they did not show up in Evolution when I downloaded the messages. I've got two pieces of software that I don't trust completely, but Evolution has been around longer so I trust it a bit more :-). Has anyone else seen problems with the pop proxy losing mail? I don't know that the proxy is to blame, but it would be my first guess. One interesting thing is that the messages that got lost were the last messages to arrive in a big batch. Is it possible that the proxy somehow failed to deliver the last few messages? I also wonder what the cache directories are for and what it means that the lost messages were found in the "unknown" directory. Jeremy From Paul.Moore at atosorigin.com Wed Feb 26 15:28:50 2003 From: Paul.Moore at atosorigin.com (Moore, Paul) Date: Wed Feb 26 10:30:14 2003 Subject: [Spambayes] message lost in the system? Message-ID: <16E1010E4581B049ABC51D4975CEDB880113D93B@UKDCX001.uk.int.atosorigin.com> From: Jeremy Hylton [mailto:jeremy@alum.mit.edu] > I'm using Evolution 1.2.1 and have it configured to use pop3proxy for > mail coming from my ISP's pop server. I read a few messages using the > ISP's web interface and was surprised to discover that they did not show > up in Evolution when I downloaded the messages. Could it be that reading via the web interface marked the messages as read, and you have your client set not to re-download read mails? Paul. From tim at fourstonesExpressions.com Wed Feb 26 09:44:55 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Wed Feb 26 10:45:00 2003 Subject: [Spambayes] message lost in the system? In-Reply-To: <1046273177.29711.14.camel@localhost.localdomain> Message-ID: 2/26/2003 9:26:17 AM, Jeremy Hylton wrote: >I've been using a new MUA + spambayes setup for several weeks now. Last >night at least two messages got lost in the system, and I'm at a loss to >explain how. Luckily, there are copies of them sitting in the >pop3proxy's pop3proxy-unknown-cache directory. > >I'm using Evolution 1.2.1 and have it configured to use pop3proxy for >mail coming from my ISP's pop server. I read a few messages using the >ISP's web interface and was surprised to discover that they did not show >up in Evolution when I downloaded the messages. > >I've got two pieces of software that I don't trust completely, but >Evolution has been around longer so I trust it a bit more :-). Has >anyone else seen problems with the pop proxy losing mail? I don't know >that the proxy is to blame, but it would be my first guess. No, but I haven't been looking, either. Setting verbose: True in your Options [global] section will produce a file named _pop3proxy.log with details of what happened during the execution of the pop3proxy. It may get a bit large, but if you enable that, and have it running when a message gets lost, it may contain some vital clues as to what happened. Can you do this, and open a bug against pop3proxy? > >One interesting thing is that the messages that got lost were the last >messages to arrive in a big batch. Is it possible that the proxy >somehow failed to deliver the last few messages? I would tend to point the finger at the proxy, but it's not at all obvious how this could have happened. > >I also wonder what the cache directories are for and what it means that >the lost messages were found in the "unknown" directory. The pop3proxy caches incoming mail so it can later be instructed on how to train those messages. See the Review page on the pop3proxy user interface (http://localhost:8880). These messages are kept for 7 days, then discarded if you do nothing else with them. c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org From noreply at sourceforge.net Wed Feb 26 07:45:49 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Wed Feb 26 10:48:15 2003 Subject: [Spambayes] [ spambayes-Feature Requests-680629 ] Outlook plugin: Delete as spam marks as read Message-ID: Feature Requests item #680629, was opened at 2003-02-04 20:30 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498106&aid=680629&group_id=61702 Category: Outlook Group: None Status: Open Priority: 1 Submitted By: Tony Meyer (anadelonbrin) Assigned to: Mark Hammond (mhammond) Summary: Outlook plugin: Delete as spam marks as read Initial Comment: Personally I think it would be nice if the "delete as spam" button marked the mail item as read. Note that I'm not saying that mail that is filtered as spam should be marked as read - it shouldn't (by default). If others agree, this would be a nice addition. Perhaps as an option in the prefs. ---------------------------------------------------------------------- >Comment By: Tim Stone (timstone4) Date: 2003-02-26 09:45 Message: Logged In: YES user_id=645698 This is an interesting thread. I think it should move to the main list. Pop3proxy has a very similar configuration function, which manages options into bayescustomize.ini (by default). This is another area that we should solve the problem once... ---------------------------------------------------------------------- Comment By: Piers Haken (piersh) Date: 2003-02-07 05:38 Message: Logged In: YES user_id=10551 i don't care if you do this or not (since spambayes catches all my spam ;-) ), but please don't mark any automatically- filtered spam as 'read' - it would be a pain to check for FPs if you did. thx. ---------------------------------------------------------------------- Comment By: Mark Hammond (mhammond) Date: 2003-02-07 05:05 Message: Logged In: YES user_id=14198 Fair enough :) ---------------------------------------------------------------------- Comment By: Paul Moore (pmoore) Date: 2003-02-07 03:09 Message: Logged In: YES user_id=113328 I'd like the "Mark as read" option. Most unsures and false negatives which are spam, I can identify by subject, and hence I don't open (and I don't use the preview pane). But it's not crucial - Ctrl-Q does a very quick "Mark as read" anyway... ---------------------------------------------------------------------- Comment By: Tony Meyer (anadelonbrin) Date: 2003-02-04 21:17 Message: Logged In: YES user_id=552329 Agreed that it is not necessary. ---------------------------------------------------------------------- Comment By: Mark Hammond (mhammond) Date: 2003-02-04 21:11 Message: Logged In: YES user_id=14198 Yep, I see that makring as read could be useful in that they have been reviewed, but then I would expect Outlook's normal mechanism to still work and mark it read. I have my preview pane mark as read after 2 seconds :) Re the INI file - my problem is that the GUI needs to modify these options, and I don't see how it is trivial to keep the fairly "free-form" INI file format supported by configparser, while only writing out certain elements and not others and also keeping comments etc intact. I'll make a deal - help me with the options problem, and I will give you 5 free option . Let's take it to email... ---------------------------------------------------------------------- Comment By: Tony Meyer (anadelonbrin) Date: 2003-02-04 20:56 Message: Logged In: YES user_id=552329 My reasoning was that if the user manually selects to delete it as spam, then it is as good as read. Those that are moving via the filter have not been read. Personally I still wade through the filtered spam to check it for false positives, and mark the messages as read as I go (so that the 'unread' display is the number of messages I haven't checked). If I choose delete as spam, I then have to go to the spam folder and mark it as read. In any case, no big deal if you disagree, it was just a thought :) Re: the ini file: looking at the ini, it doesn't seem to have anything that couldn't be in the GUI. Most of it would probably fit in the "advanced" dialog. It would probably be good if the ini was only for 'beta' options - anything that is for public use should be in the GUI. And if a 'beta' option moves to 'public', then it doesn't matter (much) if it breaks, because those using beta options should be upgrading anyway. Moving the existing settings (most of which should be exposed I think) would mean breaking existing code, but maybe just this once? Maybe this discussion should move to the list? (maybe I should have posted this there originally?) ---------------------------------------------------------------------- Comment By: Mark Hammond (mhammond) Date: 2003-02-04 20:41 Message: Logged In: YES user_id=14198 I'm not too sure this should happen unless the filter also marks the items as read - otherwise you still end up with many spam in the spam folder unread, and only the ones you move manually marked as read. I'm also kinda stuck about what to do with "options". Currently, options managed by the GUI are in a pickle, while other options are in the .ini file. I don't object to having new, outlook specific options in the INI file, but I do object to all our existing code breaking should we decide later to move this option into the GUI. ---------------------------------------------------------------------- Comment By: Tony Meyer (anadelonbrin) Date: 2003-02-04 20:31 Message: Logged In: YES user_id=552329 And who else to decide on this, but Mark :) ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498106&aid=680629&group_id=61702 From jeremy at alum.mit.edu Wed Feb 26 10:46:31 2003 From: jeremy at alum.mit.edu (Jeremy Hylton) Date: Wed Feb 26 10:48:23 2003 Subject: [Spambayes] message lost in the system? In-Reply-To: <16E1010E4581B049ABC51D4975CEDB880113D93B@UKDCX001.uk.int.atosorigin.com> References: <16E1010E4581B049ABC51D4975CEDB880113D93B@UKDCX001.uk.int.atosorigin.com> Message-ID: <1046274390.29711.17.camel@localhost.localdomain> On Wed, 2003-02-26 at 10:28, Moore, Paul wrote: > From: Jeremy Hylton [mailto:jeremy@alum.mit.edu] > > I'm using Evolution 1.2.1 and have it configured to use pop3proxy for > > mail coming from my ISP's pop server. I read a few messages using the > > ISP's web interface and was surprised to discover that they did not show > > up in Evolution when I downloaded the messages. > > Could it be that reading via the web interface marked the messages as read, > and you have your client set not to re-download read mails? No. I don't have the client configured this way. I read a bunch of messages through the web and all but a few of those are now sitting in my inbox. Jeremy From noreply at sourceforge.net Wed Feb 26 07:59:42 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Wed Feb 26 10:56:18 2003 Subject: [Spambayes] [ spambayes-Bugs-673390 ] pop3proxy storage 2nd file Message-ID: Bugs item #673390, was opened at 2003-01-23 16:04 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=673390&group_id=61702 Category: pop3proxy Group: None Status: Open Resolution: None Priority: 5 Submitted By: François Granger (fgranger) Assigned to: Nobody/Anonymous (nobody) Summary: pop3proxy storage 2nd file Initial Comment: Other file missing header ---------------------------------------------------------------------- >Comment By: Tim Stone (timstone4) Date: 2003-02-26 09:59 Message: Logged In: YES user_id=645698 Francois, are you attaching files to these reports? If so, I'm not seeing them... ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=673390&group_id=61702 From noreply at sourceforge.net Wed Feb 26 08:00:12 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Wed Feb 26 10:56:26 2003 Subject: [Spambayes] [ spambayes-Bugs-693423 ] email message generates error in pop3proxy.py Message-ID: Bugs item #693423, was opened at 2003-02-25 23:02 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=693423&group_id=61702 Category: pop3proxy Group: None Status: Open Resolution: None Priority: 5 Submitted By: David Shaw (dshaw) >Assigned to: Tim Stone (timstone4) Summary: email message generates error in pop3proxy.py Initial Comment: Hi all, A friend of mine had a cache file in his "unknown" folder that caused the "review" web page in pop3proxy.py to generate the following traceback: Traceback (most recent call last): File "spambayes/Dibbler.py", line 398, in found_terminator getattr(plugin, name)(**params) File "pop3proxy.py", line 929, in onReview judgement = judgement.split(';')[0].strip() File "pop3proxy.py", line 815, in _makeMessageInfo print type(text) AttributeError: 'list' object has no attribute 'replace' He sent me the offending message, and I replicated the problem: msg = open("/Users/dshaw/Desktop/crash_spam.txt", "r") message = mbox.get_message(msg) part = typed_subpart_iterator(message, 'text', 'plain').next() text = part.get_payload() >>> text [] So, instead of text, the payload is a list containing a single email message instance. Here are the objects' respective payloads: >>> message._payload [, , , , , , , , , , , , , ] ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=693423&group_id=61702 From noreply at sourceforge.net Wed Feb 26 08:36:01 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Wed Feb 26 11:31:42 2003 Subject: [Spambayes] [ spambayes-Bugs-673390 ] pop3proxy storage 2nd file Message-ID: Bugs item #673390, was opened at 2003-01-23 23:04 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=673390&group_id=61702 Category: pop3proxy Group: None >Status: Closed Resolution: None Priority: 5 Submitted By: François Granger (fgranger) Assigned to: Nobody/Anonymous (nobody) Summary: pop3proxy storage 2nd file Initial Comment: Other file missing header ---------------------------------------------------------------------- >Comment By: François Granger (fgranger) Date: 2003-02-26 17:36 Message: Logged In: YES user_id=86948 That is really old stuff (2003-01-23). It should be closed by now, I think. ---------------------------------------------------------------------- Comment By: Tim Stone (timstone4) Date: 2003-02-26 16:59 Message: Logged In: YES user_id=645698 Francois, are you attaching files to these reports? If so, I'm not seeing them... ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=673390&group_id=61702 From popiel at wolfskeep.com Wed Feb 26 10:03:25 2003 From: popiel at wolfskeep.com (T. Alexander Popiel) Date: Wed Feb 26 13:03:29 2003 Subject: [Spambayes] Re: [Python-Dev] Re: some preliminary timings In-Reply-To: Message from Tim Peters =20 References: Message-ID: <20030226180325.C449B2DDDB@cashew.wolfskeep.com> In message: Tim Peters writes: > >> and the heavier the parser, the more easily spammers can do something >> to break it. > >They're not having much success so far . Multipart MIME messages where the subheaders for a given section are completely missing (and there is no blank line after the boundary line) seem to break it quite nicely. See attached message for an example. - Alex -------------- next part -------------- A non-text attachment was scrubbed... Name: busted Type: application/unknown Size: 642 bytes Desc: not available Url : http://mail.python.org/pipermail/spambayes/attachments/20030226/5b7c3a07/busted.bin From popiel at wolfskeep.com Wed Feb 26 10:06:36 2003 From: popiel at wolfskeep.com (T. Alexander Popiel) Date: Wed Feb 26 13:06:39 2003 Subject: [Spambayes] spambayes/msgs.py used? In-Reply-To: Message from "Mark Hammond" References: Message-ID: <20030226180636.94A332DDDB@cashew.wolfskeep.com> In message: "Mark Hammond" writes: >While working out how to tackle my message-database, I bumped into msgs.py. >As far as I can see, it is completely unused. The revision history just >lists checkins by Anthony Baxter as part of larger checkins, and indicates >that the file was created on the reorg-branch branch. > >No .py file in the project appears to import this module - does anyone know >its status and/or its history? It's used in the testing tools such as timcv.py. It probably ought to be moved into the testtools tree. - Alex From tim at fourstonesExpressions.com Wed Feb 26 12:47:03 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Wed Feb 26 13:47:08 2003 Subject: [Spambayes] A data point Message-ID: I recently integrated Spambayes into my Lotus Notes client, and the results have been (predictably) spectacular. For the first time in two years, I'm spam-free on my work email address. Just as a data point, I've trained on 144 spam and 36 ham right now (about a day's worth), and it's the fp/fn rate is negligible already. c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org From kjellqvist at nordkalak.se Wed Feb 26 21:10:12 2003 From: kjellqvist at nordkalak.se (=?iso-8859-1?q?G=F6ran=20K=E4llqvist?=) Date: Wed Feb 26 15:58:35 2003 Subject: [Spambayes] Start pop3proxy at startup Message-ID: <200302262110.12535.kjellqvist@nordkalak.se> Hi! Can you give me any advice of where and how I should start pop3proxy at bootup. Or do I have to start it manually before I use my e-mail client? Greetings G?ran K?llqvist, Sweden From shahsam at eecs.umich.edu Wed Feb 26 15:34:03 2003 From: shahsam at eecs.umich.edu (Sam Shah) Date: Wed Feb 26 15:58:54 2003 Subject: [Spambayes] bug: long lines trashed w/ mboxtrain.py Message-ID: <20030226203403.GA4998@eecs.umich.edu> Consider a simple mbox like (excuse long to line): From test@umich.edu Fri Jan 24 17:18:05 2003 From: "Test" To: "Someone Test #A" ,,"Someone Test #B" , "Someone Test #C" , "Someone Test #D" Subject: testing Date: Fri, 24 Jan 2003 17:21:35 -0500 Status: RO Content-Length: 5 Lines: 1 Test We run mboxtrain.py on it, twice: [sligo tmp/spambayes-1.0a2 ]$ ./mboxtrain.py -d test -g msg Training ham (msg): Reading as Unix mbox Trained 1 out of 1 messages [sligo tmp/spambayes-1.0a2 ]$ ./mboxtrain.py -d test -g msg Training ham (msg): Reading as Unix mbox Trained 0 out of 1 messages We get the following. Notice the To line is somehow munged. From test@umich.edu Fri Jan 24 17:18:05 2003 From: "Test" To: "Someone Test #A" ,,"Someone Test #B" "Someone Test #D" Subject: testing Date: Fri, 24 Jan 2003 17:21:35 -0500 Status: RO Content-Length: 5 Lines: 1 X-Spambayes-Trained: ham Test I tried looking at the code for mboxtrain, but I couldn't find an obvious problem. This occurs with Spambayes v1.0a2 running on Python 2.2.2. Thanks, Sam From papaDoc at videotron.ca Wed Feb 26 16:18:37 2003 From: papaDoc at videotron.ca (papaDoc) Date: Wed Feb 26 16:19:39 2003 Subject: [Spambayes] Start pop3proxy at startup In-Reply-To: <200302262110.12535.kjellqvist@nordkalak.se> References: <200302262110.12535.kjellqvist@nordkalak.se> Message-ID: <3E5D2F2D.7030302@videotron.ca> Hi, Which OS are you using ? >Hi! >Can you give me any advice of where and how I should start pop3proxy at >bootup. Or do I have to start it manually before I use my e-mail client? > > Remi From T.A.Meyer at massey.ac.nz Thu Feb 27 10:48:33 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Wed Feb 26 16:49:12 2003 Subject: [Spambayes] [ spambayes-Feature Requests-680629 ] Outlook plugin: Delete asspam marks as read Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1318CD71@its-xchg4.massey.ac.nz> [TimS] > This is an interesting thread. I think it should move to the > main list. > Pop3proxy has a very similar configuration function, which manages > options into bayescustomize.ini (by default). This is > another area that we should solve the problem once... Are you talking about the .ini file comments, and not the 'mark as read' ones? If so, good, 'cause I was going to bring this up shortly anyway ;). Mark & I had some discussion about this off the list, and I implemented some things (work have been working for me for the last couple of weeks at least). The way it works (on my system) now is: * All spambayes options are stored in the ini. Some of these are accessable from the Outlook GUI (in the Advanced Dialog), and Outlook makes sure that these are saved if modified. * All Outlook options are stored in a pickle. All of these are accessable from the Outlook GUI, and Outlook also makes sure that these are saved if modified. A couple of points: * It certainly would be nice if all the non-core stuff (like [pop3proxy]) was not in the main options, so that those systems that didn't use it didn't have to load it. (For example, pop3proxy and ui could be stored in a pickle/separate ini file). However, this would break existing systems, so is bad, plus those that use those systems might not like it as much. Plus those without any GUI would still need the options in the ini file. So probably -1 for this, as abstractly nice as it would be. * The current function in pop3proxy that updates the ini file is nice and small, but wipes any comments in the file, which it would be nice to keep. I have a replacement for this that *my* Outlook plugin uses. I have been thinking about abstracting this out, and making it a function of (probably) the Options class. Pop3proxy could then use this function, so that there is only one set of code, and comments can be saved. Thoughts? =Tony Meyer From tim at fourstonesExpressions.com Wed Feb 26 15:42:02 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Wed Feb 26 16:54:35 2003 Subject: [Spambayes] bug: long lines trashed w/ mboxtrain.py In-Reply-To: <20030226203403.GA4998@eecs.umich.edu> Message-ID: Sam, this is clearly a bug. Please file a bug report, and attach your mail to it. We'll be able to track progress on it much better that way. Thanks. - TimS 2/26/2003 2:34:03 PM, Sam Shah wrote: >Consider a simple mbox like (excuse long to line): > > From test@umich.edu Fri Jan 24 17:18:05 2003 > From: "Test" > To: "Someone Test #A" , ,"Someone Test #B" , "Someone Test #C" , "Someone Test #D" > Subject: testing > Date: Fri, 24 Jan 2003 17:21:35 -0500 > Status: RO > Content-Length: 5 > Lines: 1 > > Test > >We run mboxtrain.py on it, twice: > > [sligo tmp/spambayes-1.0a2 ]$ ./mboxtrain.py -d test -g msg > Training ham (msg): > Reading as Unix mbox > Trained 1 out of 1 messages > [sligo tmp/spambayes-1.0a2 ]$ ./mboxtrain.py -d test -g msg > Training ham (msg): > Reading as Unix mbox > Trained 0 out of 1 messages > >We get the following. Notice the To line is somehow munged. > > From test@umich.edu Fri Jan 24 17:18:05 2003 > From: "Test" > To: "Someone Test #A" > ,,"Someone Test #B" > "Someone Test #D" > Subject: testing > Date: Fri, 24 Jan 2003 17:21:35 -0500 > Status: RO > Content-Length: 5 > Lines: 1 > X-Spambayes-Trained: ham > > Test > >I tried looking at the code for mboxtrain, but I couldn't find an >obvious problem. This occurs with Spambayes v1.0a2 running on >Python 2.2.2. > >Thanks, >Sam > > >_______________________________________________ >Spambayes mailing list >Spambayes@python.org >http://mail.python.org/mailman/listinfo/spambayes > > c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org From noreply at sourceforge.net Wed Feb 26 13:58:49 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Wed Feb 26 17:01:55 2003 Subject: [Spambayes] [ spambayes-Bugs-693935 ] mboxtrain.py trashes long lines Message-ID: Bugs item #693935, was opened at 2003-02-26 16:58 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=693935&group_id=61702 Category: None Group: None Status: Open Resolution: None Priority: 5 Submitted By: Sam Shah (sshah) Assigned to: Nobody/Anonymous (nobody) Summary: mboxtrain.py trashes long lines Initial Comment: mboxtrain.py (v1.0a2) trashes long lines if it is run multiple times on a mbox. Instructions for reproducing the bug, as well as a sample message that gets corrupted, are in the attached file. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=693935&group_id=61702 From tim at fourstonesExpressions.com Wed Feb 26 16:04:44 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Wed Feb 26 17:04:49 2003 Subject: [Spambayes] [ spambayes-Feature Requests-680629 ] Outlook plugin: Delete asspam marks as read In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F1318CD71@its-xchg4.massey.ac.nz> Message-ID: <53EZTC93YIG1YNH31JFQOCATR4A931.3e5d39fc@myst> 2/26/2003 3:48:33 PM, "Meyer, Tony" wrote: >[TimS] >> This is an interesting thread. I think it should move to the >> main list. >> Pop3proxy has a very similar configuration function, which manages >> options into bayescustomize.ini (by default). This is >> another area that we should solve the problem once... > >Are you talking about the .ini file comments, and not the 'mark as read' ones? Yes. > >A couple of points: >* It certainly would be nice if all the non-core stuff (like [pop3proxy]) was not in the main options, so that those systems that didn't use it didn't have to load it. (For example, pop3proxy and ui could be stored in a pickle/separate ini file). I see no use for a pickle here. Why keep an ini file and a pickle in sync? Just use the ini file as the persistence. ConfigParser does this now anyway. > However, this would break existing systems, Breaking existing systems isn't bad if we fix them... > so is bad, plus those that use those systems might not like it as much. Plus those without any GUI would still need the options in the ini file. So probably -1 for this, as abstractly nice as it would be. >* The current function in pop3proxy that updates the ini file is nice and small, but wipes any comments in the file, which it would be nice to keep. I have a replacement for this that *my* Outlook plugin uses. I have been thinking about abstracting this out, and making it a function of (probably) the Options class. Pop3proxy could then use this function, so that there is only one set of code, and comments can be saved. This sounds good to me. I wonder if the Options file is really what we need to carry forward, though. A simple ini file is really more useful and maintainable (imo), and the options module could easily be modified to support the same syntax that it supports now, but read from an ini file. The syntax is the only value that it adds now anyway, and that's what should be preserved. Dumb beats smart in this instance, for sure. Let's keep it simple. c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org From T.A.Meyer at massey.ac.nz Thu Feb 27 11:05:53 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Wed Feb 26 17:06:29 2003 Subject: [Spambayes] Outlook Plugin Installer Comments Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1318CD72@its-xchg4.massey.ac.nz> I finally got round to unregistering my CVS Outlook plugin and using the installer. Comments: * I get a *lot* of "warning: use func(*args, **kwargs) instead of apply(func, args, kwargs)" comments in the trace. Only a warning, so everything still works, but it fills the trace up. * I can now see the plugin in Outlook's Options->Other->Advanced->COM Addins preferences. I think this is still listed as a bug/feature request, so it can be removed if so. * When getting a folder list, I get the following trace: --- Traceback (most recent call last): File "E:\src\spambayes\Outlook2000\dialogs\FolderSelector.py", line 310, in OnInitDialog File "E:\src\spambayes\Outlook2000\dialogs\FolderSelector.py", line 347, in _UpdateStatus File "E:\src\Installer\iu.py", line 296, in importHook ImportError: No module named timer --- Everything still works though. I get a lot of these, but all for the same module. * I got one instance of "warning: raising a string exception is deprecated". I'm not sure when this arrived - during training I think. * I'm now using db instead of a pickle, and it works fine (and *much* quicker). I presume this is because the installed version was built with Python 2.3. What does this mean for the bug that currently causes problems for those of us with 2.2? (How soon is 2.3 due out?). All in all, very nice, well done :) =Tony Meyer From mhammond at skippinet.com.au Thu Feb 27 09:11:55 2003 From: mhammond at skippinet.com.au (Mark Hammond) Date: Wed Feb 26 17:12:30 2003 Subject: [Spambayes] Adding a message database In-Reply-To: <15964.51217.18675.935909@montanaro.dyndns.org> Message-ID: > Mark> I would like spambayes to assist in managing a database of > Mark> message_ids, mapped to how they were previously trained. > ... much stuff elided ... > > I understand what you want to do, but not why. Can you provide some > motivation? I simply want a memory of how a specific message was trained, for the following reasons: * Accidental attempt to train the same message, in the same way, multiple times. * Accidental attempt to train the same message as ham and spam. It really is as simple as that. Note that we *could* store this information with the message itself - but this would mean that a simple train operation *modifies* messages. This allows us to keep training as readonly wrt the messages, while maintaining integrity. Mark. From tim at fourstonesExpressions.com Wed Feb 26 16:14:17 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Wed Feb 26 17:14:21 2003 Subject: [Spambayes] Adding a message database In-Reply-To: Message-ID: 2/26/2003 4:11:55 PM, "Mark Hammond" wrote: >It really is as simple as that. Note that we *could* store this information >with the message itself - but this would mean that a simple train operation >*modifies* messages. This allows us to keep training as readonly wrt the >messages, while maintaining integrity. In reality, with some systems (Notes) you *can't* store this information with the message. - TimS c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org From mhammond at skippinet.com.au Thu Feb 27 09:19:08 2003 From: mhammond at skippinet.com.au (Mark Hammond) Date: Wed Feb 26 17:20:04 2003 Subject: [Spambayes] RE: Outlook Plugin Installer Comments In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F1318CD72@its-xchg4.massey.ac.nz> Message-ID: > * I get a *lot* of "warning: use func(*args, **kwargs) instead of > apply(func, args, kwargs)" comments in the trace. Only a > warning, so everything still works, but it fills the trace up. Me too. I believe that will be an apply() inside the McMillan Installer, combined with a "feature" I added to the 2.3 alpha causing these errors to get spewed when they shouldn't (thanks Just!) > * I can now see the plugin in Outlook's > Options->Other->Advanced->COM Addins preferences. I think this > is still listed as a bug/feature request, so it can be removed if so. Cool - I didn't notice that :) I'll check out the others too. > * I'm now using db instead of a pickle, and it works fine (and > *much* quicker). I presume this is because the installed version > was built with Python 2.3. Correct. > What does this mean for the bug that > currently causes problems for those of us with 2.2? What bug is that? > (How soon is 2.3 due out?). An alpha is out today, and basically what I built the installer with. I intend doing a new win32all, including a "proper" 2.3 version, very soon. Mark. From mhammond at skippinet.com.au Thu Feb 27 09:19:08 2003 From: mhammond at skippinet.com.au (Mark Hammond) Date: Wed Feb 26 17:20:15 2003 Subject: [Spambayes] porting database In-Reply-To: Message-ID: > Is it possible to port the gathered data trained on outlook to be used in > procmail? > > the case: > I'm using outlook as a MUA for most of my mails, but have a shell account > where I do a lot of initial filtering (virus-checks, spam-checks, etc). As I understand things, both pickles and bsddbs are portable across platforms. Have you tried simply copying the database from Outlook to wherever the other system is expecting it? Mark. From T.A.Meyer at massey.ac.nz Thu Feb 27 11:21:37 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Wed Feb 26 17:22:15 2003 Subject: [Spambayes] Storing Options [was Outlook feature request] Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1318CD73@its-xchg4.massey.ac.nz> [Tony] > >* It certainly would be nice if all the non-core stuff (like > [pop3proxy]) was > not in the main options, so that those systems that didn't > use it didn't have > to load it. [TimS] > I see no use for a pickle here. Why keep an ini file and a > pickle in sync? > Just use the ini file as the persistence. ConfigParser does > this now anyway. I meant that it would be nice to separate the core spambayes options from the implementation specific (hammie, pop3proxy, ui, Outlook) ones. These could be kept in a separate ini file (or pickle, whichever you like). But still -1 for the reasons outlined. > Breaking existing systems isn't bad if we fix them... :) If it comes to it, I must remember you are the one to commit the changes so that people scream at you when the bit we missed breaks ;) [New update ini function] > This sounds good to me. I wonder if the Options file is > really what we need to carry forward, though. > A simple ini file is really more useful and > maintainable (imo), and the options module could easily be > modified to support the same syntax that it supports now, > but read from an ini file. The syntax > is the only value that it adds now anyway, and that's what should be > preserved. Well, the options module is pretty much a module with an embedded ini file, isn't it? I presume that back when things started there was a reason to have all the defaults in a module rather than provide a default ini (but I can't be bothered looking through the list history). Why do you think that an ini would be more maintainable? =Tony Meyer From tim at fourstonesExpressions.com Wed Feb 26 16:29:16 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Wed Feb 26 17:29:20 2003 Subject: [Spambayes] Storing Options [was Outlook feature request] In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F1318CD73@its-xchg4.massey.ac.nz> Message-ID: <54JIHCH652XZULH1ZYW944WRPQNRQ4Z.3e5d3fbc@myst> 2/26/2003 4:21:37 PM, "Meyer, Tony" wrote: >> Breaking existing systems isn't bad if we fix them... >:) If it comes to it, I must remember you are the one to commit the changes so that people scream at you when the bit we missed breaks ;) And they'll scream... > >[New update ini function] >> This sounds good to me. I wonder if the Options file is >> really what we need to carry forward, though. >> A simple ini file is really more useful and >> maintainable (imo), and the options module could easily be >> modified to support the same syntax that it supports now, >> but read from an ini file. The syntax >> is the only value that it adds now anyway, and that's what should be >> preserved. > >Well, the options module is pretty much a module with an embedded ini file, isn't it? I presume that back when things started there was a reason to have all the defaults in a module rather than provide a default ini (but I can't be bothered looking through the list history). Why do you think that an ini would be more maintainable? The ini file is a standard format, and we can probably even find a client that maintains the file for us. It's recognizable to most everyone, and doesn't require that a particular parameter be specified, then 'cracked' in a separate place. The downside is that the ConfigParser doesn't recognize anything but string values. But the options file is not as easily maintainable using a GUI client, because it's parsed at import time. So you gotta go through some gyrations make the GUI show the proper values, and write the options file (the option module won't write itself, and probably can't). Then if you manage to make a mistake in the options file, stuff doesn't run anymore, cause the import will raise an exception. So... the option file is broken, but you have to use the import file to bring up the GUI to fix the problem, etc. etc.... These are the problems I encountered when I wrote the option configurator for pop3proxy. It was ugly. c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org From skip at pobox.com Wed Feb 26 16:29:28 2003 From: skip at pobox.com (Skip Montanaro) Date: Wed Feb 26 17:29:37 2003 Subject: [Spambayes] porting database In-Reply-To: References: Message-ID: <15965.16328.761223.307753@montanaro.dyndns.org> Mark> As I understand things, both pickles and bsddbs are portable Mark> across platforms. Have you tried simply copying the database from Mark> Outlook to wherever the other system is expecting it? Pickles, yes, bsddbs, not necessarily: % file hammie.db hammie.db: Berkeley DB (Hash, version 7, big-endian) I don't know if the Berkeley DB code can automatically handle differences in endianness, but the "big-endian" bit gives me pause. Also, note that the Berkeley DB library versions need to be close enough on the two machines that they read and write files of the same version number. Skip From tim at fourstonesExpressions.com Wed Feb 26 16:31:20 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Wed Feb 26 17:31:23 2003 Subject: [Spambayes] porting database In-Reply-To: <15965.16328.761223.307753@montanaro.dyndns.org> Message-ID: 2/26/2003 4:29:28 PM, Skip Montanaro wrote: > > Mark> As I understand things, both pickles and bsddbs are portable > Mark> across platforms. Have you tried simply copying the database from > Mark> Outlook to wherever the other system is expecting it? > >Pickles, yes, bsddbs, not necessarily: > > % file hammie.db > hammie.db: Berkeley DB (Hash, version 7, big-endian) > >I don't know if the Berkeley DB code can automatically handle differences in >endianness, but the "big-endian" bit gives me pause. Also, note that the >Berkeley DB library versions need to be close enough on the two machines >that they read and write files of the same version number. dbExpImp.py handles all these problems. - TimS > >Skip > >_______________________________________________ >Spambayes mailing list >Spambayes@python.org >http://mail.python.org/mailman/listinfo/spambayes > > c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org From tshumway at jdiworks.net Wed Feb 26 14:38:28 2003 From: tshumway at jdiworks.net (Terrel Shumway) Date: Wed Feb 26 17:34:50 2003 Subject: [Spambayes] Adding a message database In-Reply-To: References: Message-ID: <1046299108.2374.44.camel@juniper.localnet> On Wed, 2003-02-26 at 14:11, Mark Hammond wrote: > It really is as simple as that. Note that we *could* store this information > with the message itself - but this would mean that a simple train operation > *modifies* messages. This allows us to keep training as readonly wrt the > messages, while maintaining integrity. Another reason to not store it in the message is that this would be "inband signaling". Anything we store in the message itself is going to be a target of spoofing by spammers. (Maybe this spoofing has already been addressed. I am just barely starting to "lurk actively".) -- Terrel From skip at pobox.com Wed Feb 26 16:37:16 2003 From: skip at pobox.com (Skip Montanaro) Date: Wed Feb 26 17:37:31 2003 Subject: [Spambayes] porting database In-Reply-To: References: <15965.16328.761223.307753@montanaro.dyndns.org> Message-ID: <15965.16796.566283.867633@montanaro.dyndns.org> >> Pickles, yes, bsddbs, not necessarily: ... Tim> dbExpImp.py handles all these problems. - TimS As does pickle2db.py and db2pickle.py in the Tools/scripts directory of the Python distribution (meant more for transporting various db files between machines, not specific to spambayes, and not meant to convert between spambayes pickles and spambayes db files). Skip From mhammond at skippinet.com.au Thu Feb 27 09:36:37 2003 From: mhammond at skippinet.com.au (Mark Hammond) Date: Wed Feb 26 17:37:41 2003 Subject: [Spambayes] Storing Options [was Outlook feature request] In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F1318CD73@its-xchg4.massey.ac.nz> Message-ID: I'm a little confused :) First, a little history: when I picked up spambayes, it indirectly used the Python ConfigParser module for options. This module handles "ini files on steriods", and is quite nice - except that the module does not support *writing* to the file. For the Outlook plugin, I needed to persist a number of new options that were being configured in the GUI - such as the list of folders to watch and filter to. However, I could see no reasonable way to *write* a config file, while still maintaining the INI-like file as the user intends - ie, with comments, etc. So I simply punted, and kept the INI-like file for user-maintained options, and a pickle for the GUI maintained ones. Tony then started discussing further options, and I pointed out that I hated the 2-file scheme I hacked together. I didn't want the code to know *where* an option was stored, as that would make future changes to the GUI very painful (as the location of where the option is stored would need to change depending on it is was a "hidden" option or not) So: > [TimS] > > I see no use for a pickle here. Why keep an ini file and a > > pickle in sync? > > Just use the ini file as the persistence. ConfigParser does *writing* the INI file is an issue, especially if you want to keep user comments, and I believe you do. > [New update ini function] > > This sounds good to me. I wonder if the Options file is > > really what we need to carry forward, though. > > A simple ini file is really more useful and > > maintainable (imo), and the options module could easily be > > modified to support the same syntax that it supports now, > > but read from an ini file. The syntax > > is the only value that it adds now anyway, and that's what should be > > preserved. I'm a little confused by this. As far as I can tell, we already *do* have an INI file, but for reasons of OS politics , it is called an "options" file. But from my POV, if we can write entries to an options file, I would need nothing else. I note that Tony has done some work in this regard, but I just haven't got to it yet. Mark. From T.A.Meyer at massey.ac.nz Thu Feb 27 11:37:46 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Wed Feb 26 17:38:24 2003 Subject: [Spambayes] RE: Outlook Plugin Installer Comments Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1318CD74@its-xchg4.massey.ac.nz> [Tony] > > What does this mean for the bug that > > currently causes problems for those of us with 2.2? [Mark] > What bug is that? Ack. I was sure that one was submitted, but I can't see it there (sorry). It was discussed in the list (by me, mid Feb under "Outlook Plugin crashing Outlook", and a bit later by Gabriel Mino under "Outlook plugin"). Basically, I have bsddb3 (4.1.3), and so manager.py decides to use it. This crashes Outlook (not the plugin) in a major way. Changing back to the pickle fixes things. [Tony] > > (How soon is 2.3 due out?). [Mark] > An alpha is out today, and basically what I built the > installer with. I > intend doing a new win32all, including a "proper" 2.3 > version, very soon. Cool - I guess I'll update to that so that I can use the db rather than pickle then. I don't close Outlook down very often, but the speediness is nice. =Tony Meyer From noreply at sourceforge.net Wed Feb 26 14:34:55 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Wed Feb 26 17:40:05 2003 Subject: [Spambayes] [ spambayes-Bugs-693935 ] mboxtrain.py trashes long lines Message-ID: Bugs item #693935, was opened at 2003-02-26 15:58 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=693935&group_id=61702 Category: None Group: None Status: Open Resolution: None Priority: 5 Submitted By: Sam Shah (sshah) Assigned to: Nobody/Anonymous (nobody) Summary: mboxtrain.py trashes long lines Initial Comment: mboxtrain.py (v1.0a2) trashes long lines if it is run multiple times on a mbox. Instructions for reproducing the bug, as well as a sample message that gets corrupted, are in the attached file. ---------------------------------------------------------------------- >Comment By: Skip Montanaro (montanaro) Date: 2003-02-26 16:34 Message: Logged In: YES user_id=44345 The bug (if it's not a feature) is actually in the email package, not mboxtrain. See the attached interpreter transcript. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=693935&group_id=61702 From noreply at sourceforge.net Wed Feb 26 14:42:52 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Wed Feb 26 17:40:14 2003 Subject: [Spambayes] [ spambayes-Bugs-693935 ] mboxtrain.py trashes long lines Message-ID: Bugs item #693935, was opened at 2003-02-26 16:58 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=693935&group_id=61702 Category: None Group: None Status: Open Resolution: None Priority: 5 Submitted By: Sam Shah (sshah) Assigned to: Nobody/Anonymous (nobody) Summary: mboxtrain.py trashes long lines Initial Comment: mboxtrain.py (v1.0a2) trashes long lines if it is run multiple times on a mbox. Instructions for reproducing the bug, as well as a sample message that gets corrupted, are in the attached file. ---------------------------------------------------------------------- >Comment By: Sam Shah (sshah) Date: 2003-02-26 17:42 Message: Logged In: YES user_id=224899 That transcript is okay; it just rewrites the header. All recipients are there. If you notice in the report I sent in, recipient C is gone from the message, which is obviously a bug. ---------------------------------------------------------------------- Comment By: Skip Montanaro (montanaro) Date: 2003-02-26 17:34 Message: Logged In: YES user_id=44345 The bug (if it's not a feature) is actually in the email package, not mboxtrain. See the attached interpreter transcript. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=693935&group_id=61702 From noreply at sourceforge.net Wed Feb 26 14:49:14 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Wed Feb 26 17:40:23 2003 Subject: [Spambayes] [ spambayes-Bugs-693935 ] mboxtrain.py trashes long lines Message-ID: Bugs item #693935, was opened at 2003-02-26 16:58 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=693935&group_id=61702 Category: None Group: None Status: Open Resolution: None Priority: 5 Submitted By: Sam Shah (sshah) Assigned to: Nobody/Anonymous (nobody) Summary: mboxtrain.py trashes long lines Initial Comment: mboxtrain.py (v1.0a2) trashes long lines if it is run multiple times on a mbox. Instructions for reproducing the bug, as well as a sample message that gets corrupted, are in the attached file. ---------------------------------------------------------------------- >Comment By: Sam Shah (sshah) Date: 2003-02-26 17:49 Message: Logged In: YES user_id=224899 Okay, it's clearly a bug in the email package. If you look at the attached transcript, you see that recipient C is missing after the second parsing. ---------------------------------------------------------------------- Comment By: Sam Shah (sshah) Date: 2003-02-26 17:42 Message: Logged In: YES user_id=224899 That transcript is okay; it just rewrites the header. All recipients are there. If you notice in the report I sent in, recipient C is gone from the message, which is obviously a bug. ---------------------------------------------------------------------- Comment By: Skip Montanaro (montanaro) Date: 2003-02-26 17:34 Message: Logged In: YES user_id=44345 The bug (if it's not a feature) is actually in the email package, not mboxtrain. See the attached interpreter transcript. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=693935&group_id=61702 From skip at pobox.com Wed Feb 26 16:44:27 2003 From: skip at pobox.com (Skip Montanaro) Date: Wed Feb 26 17:44:37 2003 Subject: [Spambayes] Storing Options [was Outlook feature request] In-Reply-To: References: <1ED4ECF91CDED24C8D012BCF2B034F1318CD73@its-xchg4.massey.ac.nz> Message-ID: <15965.17227.342456.940679@montanaro.dyndns.org> Mark> First, a little history: when I picked up spambayes, it indirectly Mark> used the Python ConfigParser module for options. This module Mark> handles "ini files on steriods", and is quite nice - except that Mark> the module does not support *writing* to the file. That's why it's called "Config *Parser*". It parses files. You want the ConfigWriter module, which I'm afraid doesn't exist. Even if it did, you'd have the problem that the current options file flattens out the option hierarchy a bit, so that it's not Options.options.Tokenizer.basic_header_tokenize but Options.options.basic_header_tokenize To feed the current options to some sort of config file writer you'd probably have to correct that. Skip From T.A.Meyer at massey.ac.nz Thu Feb 27 11:44:39 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Wed Feb 26 17:45:28 2003 Subject: [Spambayes] Storing Options [was Outlook feature request] Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1318D542@its-xchg4.massey.ac.nz> > I'm a little confused :) Aren't we all ;) Thanks for the history though, it cleared up a couple of things (things I probably should have asked last time we discussed this). > *writing* the INI file is an issue, especially if you want to > keep user comments, and I believe you do. Not any more :) (well, there is one bit of my writing function that could be nicer, but I'll get to it). > I'm a little confused by this. As far as I can tell, we > already *do* have an INI file, but for reasons of OS politics > , it is called an "options" file. That's what I thought, and was trying to say :) > But from my POV, if we can write entries to an options file, > I would need nothing else. So, for Outlook, would you move the options that are stored in the pickle to the ini under [Outlook] or something like that? I didn't do this because some options, particularly folder ids, are not at all suitable for hand-modifying, and would look odd in the ini. > I note that Tony has done some > work in this regard, but I just haven't got to it yet. :) No worries. =Tony Meyer From tshumway at jdiworks.net Wed Feb 26 14:50:17 2003 From: tshumway at jdiworks.net (Terrel Shumway) Date: Wed Feb 26 17:46:38 2003 Subject: [Spambayes] Adding a message database In-Reply-To: References: Message-ID: <1046299817.2381.47.camel@juniper.localnet> On Wed, 2003-02-26 at 14:11, Mark Hammond wrote: > It really is as simple as that. Note that we *could* store this information > with the message itself - but this would mean that a simple train operation > *modifies* messages. This allows us to keep training as readonly wrt the > messages, while maintaining integrity. Another reason to not store it in the message is that this would be "inband signaling". Anything we store in the message itself is going to be a target of spoofing by spammers. (Maybe this spoofing has already been addressed. I am just barely starting to "lurk actively".) -- Terrel From T.A.Meyer at massey.ac.nz Thu Feb 27 11:46:09 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Wed Feb 26 17:46:49 2003 Subject: [Spambayes] Storing Options [was Outlook feature request] Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1318D543@its-xchg4.massey.ac.nz> > That's why it's called "Config *Parser*". It parses files. > You want the ConfigWriter module, which I'm afraid doesn't exist. LOL. I guess this is where I should abstract my Outlook ini writer to. I'll get to this at some point today or tomorrow. =Tony Meyer. From tshumway at jdiworks.net Wed Feb 26 14:54:28 2003 From: tshumway at jdiworks.net (Terrel Shumway) Date: Wed Feb 26 17:50:56 2003 Subject: [Spambayes] Storing Options [was Outlook feature request] In-Reply-To: References: Message-ID: <1046300068.2381.52.camel@juniper.localnet> On Wed, 2003-02-26 at 14:36, Mark Hammond wrote: > I'm a little confused :) > > First, a little history: when I picked up spambayes, it indirectly used the > Python ConfigParser module for options. This module handles "ini files on > steriods", and is quite nice - except that the module does not support > *writing* to the file. > > For the Outlook plugin, I needed to persist a number of new options that > were being configured in the GUI - such as the list of folders to watch and > filter to. However, I could see no reasonable way to *write* a config file, > while still maintaining the INI-like file as the user intends - ie, with > comments, etc. What happened to WritePrivateProfileString() ? From mhammond at skippinet.com.au Thu Feb 27 10:01:22 2003 From: mhammond at skippinet.com.au (Mark Hammond) Date: Wed Feb 26 18:02:36 2003 Subject: [Spambayes] Storing Options [was Outlook feature request] In-Reply-To: <1046300068.2381.52.camel@juniper.localnet> Message-ID: > What happened to WritePrivateProfileString() ? 1) Windows only 2) A Config file is really an INI file on steriods. From what I can see, data valid in an INI file is only a subset of what OptionParser can handle. Mark. From tim.one at comcast.net Wed Feb 26 18:07:11 2003 From: tim.one at comcast.net (Tim Peters) Date: Wed Feb 26 18:07:46 2003 Subject: [Spambayes] Storing Options [was Outlook feature request] In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F1318CD73@its-xchg4.massey.ac.nz> Message-ID: [Meyer, Tony] > ... > Well, the options module is pretty much a module with an embedded > ini file, isn't it? Right. > I presume that back when things started there was a reason to have all > the defaults in a module rather than provide a default ini (but I can't be > bothered looking through the list history). Nope, there was a distinct default .ini file at the start. Skip insisted on running the code from bizarre directories, though, and rather than add 12 gimmicks to specify an "ini file search path", I folded the file into the module instead. From barry at python.org Wed Feb 26 18:07:24 2003 From: barry at python.org (Barry A. Warsaw) Date: Wed Feb 26 18:07:57 2003 Subject: [Spambayes] bug: long lines trashed w/ mboxtrain.py References: <20030226203403.GA4998@eecs.umich.edu> Message-ID: <15965.18604.314130.826203@gargle.gargle.HOWL> mimelib'ers, for reference see http://mail.python.org/pipermail/spambayes/2003-February/003602.html >>>>> "SS" == Sam Shah writes: SS> I tried looking at the code for mboxtrain, but I couldn't find SS> an obvious problem. This occurs with Spambayes v1.0a2 running SS> on Python 2.2.2. I think this is deficiency in the email package, which is why I'm CC'ing the mimelib-devel list. This may be a contributing factor to the persistent "extra whitespace in subject headers" problem Mailman is seeing, but I'm not yet sure. Here's the dealie: RFC 2822 describes ascii headers, their max length, how long lines should be split, etc. We're going to ignore RFC 2047 encoded non-ascii headers since think email handles these basically correctly. $2.1.1 of RFC 2822 says the lines SHOULD not be longer than 78 characters, and MUST not be longer than 998 characters. When lines are longer than this, $2.2.3 says that lines should be split at the highest level "syntactic break", which isn't really specified, but is different for different headers. E.g. for Received headers or Content-Type, you can imagine splitting first on semi-colons, while for Subject headers you'd split at any folding whitespace. For recipient headers (To, CC), you'd want to split at the commas first, then at fws. email.Header.Header has a method called _ascii_split() which embodies the splitting policy. It's currently hard coded to first try to split on semis, then to split on fws. It's got no idea it should even try to split on commas, as ought to be the case with Sam's example. In theory, I think the solution should be to be able to pass something like an AsciiSplittingPolicy instance to the Header class so that the application can choose what "highest level syntactic breaks" mean. Then we might have some common instances such as SplitOnCommasTheFoldingWhitespace, etc. (with more reasonable names perhaps ). We might go further and provide a default mapping of headers to splitters for when no splitter is specified. It's probably going to be painful to implement this though. ;) When this message gets archived, I'll submit a bug report on the mimelib project and work on a fix as time allows. -Barry From skip at pobox.com Wed Feb 26 17:21:43 2003 From: skip at pobox.com (Skip Montanaro) Date: Wed Feb 26 18:21:54 2003 Subject: [Spambayes] Storing Options [was Outlook feature request] In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F1318D543@its-xchg4.massey.ac.nz> References: <1ED4ECF91CDED24C8D012BCF2B034F1318D543@its-xchg4.massey.ac.nz> Message-ID: <15965.19463.562722.698074@montanaro.dyndns.org> Tony> I guess this is where I should abstract my Outlook ini writer to. Tony> I'll get to this at some point today or tomorrow. Please consider submitting it for consideration to the standard Python library. Skip From tshumway at jdiworks.net Wed Feb 26 15:27:11 2003 From: tshumway at jdiworks.net (Terrel Shumway) Date: Wed Feb 26 18:23:32 2003 Subject: [Spambayes] Storing Options [was Outlook feature request] In-Reply-To: References: Message-ID: <1046302031.2468.75.camel@juniper.localnet> On Wed, 2003-02-26 at 15:01, Mark Hammond wrote: > > What happened to WritePrivateProfileString() ? > > 1) Windows only Outlook plugin is also windows-only. (Unless I missed something. I understand M$ is porting Exchange Server to BSD (MacOS X)) > 2) A Config file is really an INI file on steriods. From what I can see, > data valid in an INI file is only a subset of what OptionParser can handle. extra features: 1) use rfc822 style "key: value" lines with " continuation lines" 2) automagic interpolation of %(value)s Although the text in the docs "name=value" is also accepted seems to even disparage the "INI-compatibility", it seems that WritePrivateProfileString is not going to add anything that ConfigParser cannot handle. If we do use the rfc822 syntax in the file, there may be problems, but the name of the file "bayescustomize.ini" is a clear signal about the format. Anyone who bothers with the detail that spambayes currently happens to use ConfigParser to read it, and then actually adds rfc822 syntax is really asking for trouble anyway. -- Terrel From tim at fourstonesExpressions.com Wed Feb 26 17:39:43 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Wed Feb 26 18:39:48 2003 Subject: [Spambayes] Storing Options [was Outlook feature request] In-Reply-To: <1046302031.2468.75.camel@juniper.localnet> Message-ID: 2/26/2003 5:27:11 PM, Terrel Shumway wrote: >On Wed, 2003-02-26 at 15:01, Mark Hammond wrote: >> > What happened to WritePrivateProfileString() ? >> >> 1) Windows only >Outlook plugin is also windows-only. (Unless I missed something. I >understand M$ is porting Exchange Server to BSD (MacOS X)) Right, but this portion of the system is used by all platforms. If it's not portable, it's not in. - TimS c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org From popiel at wolfskeep.com Wed Feb 26 18:31:52 2003 From: popiel at wolfskeep.com (T. Alexander Popiel) Date: Wed Feb 26 21:31:57 2003 Subject: [Spambayes] Adding a message database In-Reply-To: Message from "Mark Hammond" References: Message-ID: <20030227023152.B3C0A2DDDB@cashew.wolfskeep.com> In message: "Mark Hammond" writes: >> Mark> I would like spambayes to assist in managing a database of >> Mark> message_ids, mapped to how they were previously trained. +2 (counting the second coder in the house who mentioned this to me a few days ago), as long as the option to not keep this database is maintained. - Alex From whisper at oz.net Wed Feb 26 23:44:05 2003 From: whisper at oz.net (David LeBlanc) Date: Thu Feb 27 02:44:15 2003 Subject: [Spambayes] OT: FYI: Microsoft virus email Message-ID: I don't think this is really from Microsoft! I received this HTML email with the following headers: Return-Path: Delivered-To: alias-oznet-whisper@oz.net Received: (qmail 22407 invoked from network); 27 Feb 2003 02:03:05 -0000 Received: from 203-114-128-8.inspire.net.nz (HELO mail1.inspire.net.nz) (203.114.128.8) by smtp4.sea.theriver.com with SMTP; 27 Feb 2003 02:03:05 -0000 Received: from localhost (localhost.localdomain [127.0.0.1]) by mail1.inspire.net.nz (Postfix) with ESMTP id AC79B1ECD4E; Thu, 27 Feb 2003 15:02:50 +1300 (NZDT) Received: from mail1.inspire.net.nz ([127.0.0.1]) by localhost (mail1 [127.0.0.1:10024]) (amavisd-new) with ESMTP id 25347-09; Thu, 27 Feb 2003 15:02:45 +1300 (NZDT) Received: from DdgHKS (203-114-134-163.inspire.net.nz [203.114.134.163]) by mail1.inspire.net.nz (Postfix) with SMTP id 982221ECD0E; Thu, 27 Feb 2003 15:01:27 +1300 (NZDT) From: "Microsoft Internet Security Section" To: "Microsoft User" <> SUBJECT: Microsoft Security Update Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="vaqcugFTSNuqXUJnWHrD" Message-Id: <20030227020127.982221ECD0E@mail1.inspire.net.nz> Date: Thu, 27 Feb 2003 15:01:27 +1300 (NZDT) X-Virus-Scanned: by amavisd-new -----Original Message----- From: Microsoft Internet Security Section [mailto:tibhro-116500@OxiZlcPnb.com] Sent: Wednesday, February 26, 2003 18:01 To: Microsoft User Subject: Microsoft Security Update Microsoft User this is the latest version of security update, the "February 2003, Cumulative Patch" update which eliminates all known security vulnerabilities affecting Internet Explorer, Outlook and Outlook Express as well as five newly discovered vulnerabilities. Install now to protect your computer from these vulnerabilities, the most serious of which could allow an attacker to run executable on your system. This update includes the functionality of all previously released patches. System requirements Win 9x/Me/2000/NT/XP This update applies to Microsoft Internet Explorer, version 4.01 and later Microsoft Outlook, version 8.00 and later Microsoft Outlook Express, version 4.01 and later Recommendation Customers should install the patch at the earliest opportunity. How to install Run attached file. Click Yes on displayed dialog box. How to use You don't need to do anything after installing this item. Microsoft Product Support Services and Knowledge Base articles can be found on the Microsoft Technical Support web site. For security-related information about Microsoft products, please visit the Microsoft Security Advisor web site, or Contact us. Please do not reply to this message. It was sent from an unmonitored e-mail address and we are unable to respond to any replies. Thank you for using Microsoft products. With friendly greetings, Microsoft Internet Security Section ---------------------------------------------------------------------------- ---- )2003 Microsoft Corporation. All rights reserved. The names of the actual companies and products mentioned herein may be the trademarks of their respective owners. ________________________________________________ I didn't think anyone would like the attachment ;-) David LeBlanc Seattle, WA USA From anthony at interlink.com.au Thu Feb 27 19:09:28 2003 From: anthony at interlink.com.au (Anthony Baxter) Date: Thu Feb 27 03:11:28 2003 Subject: [Spambayes] Re: [Python-Dev] Re: some preliminary timings In-Reply-To: Message-ID: <200302270809.h1R89Si10301@localhost.localdomain> >>> Tim Peters wrote > To the contrary, I can't think of a parsing ability of the Python email pkg > that spambayes *doesn't* use, from making sense of arbitrarily nested MIME > structure, to identifying the charsets in use. I suspect you're just > thinking of body parsing, where we do very little -- but the email pkg does > very little there too (beyond magically identifying the text portions for > us, and magically decoding base64 and quoted-printable sections). Something that occurred to me this afternoon - when the email parser's slackarse mode repairs/avoids a bit of MIME bustage in a message, it should note that in some way in the parsed Message object - SB can then use this parse warnings as clues... Anthony -- Anthony Baxter It's never too late to have a happy childhood. From tim at fourstonesExpressions.com Thu Feb 27 07:52:53 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Thu Feb 27 08:53:19 2003 Subject: [Spambayes] Digest messages Message-ID: We have an open bug report on pop3proxy crashing on a mail with mimetype multipart/digest. When one of these messages rolls around, email.Message.get_payload method returns a list of message objects rather than a simple string. This makes perfect sense. My question is: how should we treat these for classification and training? Is the entire message spam? Do we classify each message in the digest? It doesn't seem as if we can treat digest messages in the same way we treat 'regular' mail. Does the Outlook plugin or hammiefilter handle these messages? For the time being, I think all a spammer would have to do to break us is to send their spam as a digest. I'm not sure what mailers handle digest messages, though... argh. Questions, questions. c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org From popiel at wolfskeep.com Thu Feb 27 06:36:43 2003 From: popiel at wolfskeep.com (T. Alexander Popiel) Date: Thu Feb 27 09:37:09 2003 Subject: [Spambayes] OT: FYI: Microsoft virus email In-Reply-To: Message from "David LeBlanc" References: Message-ID: <20030227143644.065892DE8B@cashew.wolfskeep.com> In message: "David LeBlanc" writes: >I don't think this is really from Microsoft! This is true. Microsoft has stated on many occasions that they _NEVER_ distribute patches via email, and any patch not downloaded directly from their site is suspect. That doesn't keep the foolish from installing trojans like this, though... - Alex From Michael.Phillips at ieee.org Thu Feb 27 00:57:37 2003 From: Michael.Phillips at ieee.org (Michael Phillips) Date: Thu Feb 27 10:03:45 2003 Subject: [Spambayes] Question regarding use of Outlook Message-ID: <3E5DB6E1.3E569ED1@ieee.org> Are their relatively safe ways to run Outlook 2000? I consider OE to be somewhat safer since it is not as "capable". By this I mean, can I keep it from downloading graphics by default when viewing mail, keep it from automagically and insanely running/decompressing attached files without asking, when turning off these "features" not disabling the accessing of attachments if manually desired, etc...? I use OE because I am stuck with WinX at work currently although am working to resolve the issues. http://wecanstopspam.org TIA From tim at fourstonesExpressions.com Thu Feb 27 09:17:36 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Thu Feb 27 10:17:40 2003 Subject: [Spambayes] Question regarding use of Outlook In-Reply-To: <3E5DB6E1.3E569ED1@ieee.org> Message-ID: I would refer you to several sites for more information on this. http://www.microsoft.com/office/previous/outlook/downloads/OutlkSec.doc http://www.tcd.ie/IS_Services/help/014secu.shtml http://www.georgetown.edu/email/outlook-2000/outlook-2000.security.html I'm sure there are many online forums, newsgroups, and mailing lists that pertain to this subject. One of the better known ones is http://www.outlookexchange.com/articles/home/outlooknewsgroupinfo.asp Hope this helps, and if we can help you stop spam, please don't hesitate to call! - TimS 2/27/2003 12:57:37 AM, Michael Phillips wrote: >Are their relatively safe ways to run Outlook 2000? I consider OE to be >somewhat safer since it is not as "capable". > >By this I mean, can I keep it from downloading graphics by default when >viewing mail, keep it from automagically and insanely >running/decompressing attached files without asking, when turning off >these "features" not disabling the accessing of attachments if manually >desired, etc...? > >I use OE because I am stuck with WinX at work currently although am >working to resolve the issues. > > http://wecanstopspam.org > >TIA > > >_______________________________________________ >Spambayes mailing list >Spambayes@python.org >http://mail.python.org/mailman/listinfo/spambayes > > c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org From papaDoc at videotron.ca Thu Feb 27 11:49:28 2003 From: papaDoc at videotron.ca (papaDoc) Date: Thu Feb 27 11:49:33 2003 Subject: [Spambayes] Start pop3proxy at startup In-Reply-To: <200302271734.17922.kjellqvist@nordkalak.se> References: <200302262110.12535.kjellqvist@nordkalak.se> <3E5D2F2D.7030302@videotron.ca> <200302271734.17922.kjellqvist@nordkalak.se> Message-ID: <3E5E4198.5090900@videotron.ca> Hi G?ran, Usually if you want something to be launch at bootup. You put it in /etc/init.d/ but for that you need to be root. You can use the command "chkconfig --add your_little_script --level 3 4 5" (if I remember correctly) Take a look at the other script found in the directory given above to create your little script so that it can take the start stop restart argument. >>Hi! >>Can you give me any advice of where and how I should start pop3proxy at >>bootup. Or do I have to start it manually before I use my e-mail client? >> >> >>Sorry, I forgot to write that I'm running Linux. >> Remi From tim.one at comcast.net Thu Feb 27 11:49:34 2003 From: tim.one at comcast.net (Tim Peters) Date: Thu Feb 27 11:50:08 2003 Subject: [Spambayes] Question regarding use of Outlook In-Reply-To: <3E5DB6E1.3E569ED1@ieee.org> Message-ID: [Michael Phillips] > Are their relatively safe ways to run Outlook 2000? If you install all the Office service packs, OL2000 is so "safe" it becomes nearly unusable <0.5 wink>. > ... > when turning off these "features" not disabling the accessing of > attachments if manually desired, etc...? Nope: if you install all the service packs, a large list of attachment types becomes completely inaccessible, whether automatically or manually. Outlook 2002 has a section in the registry you can fiddle to get at attachments again, but OL2000 does not. From skip at pobox.com Thu Feb 27 11:51:50 2003 From: skip at pobox.com (Skip Montanaro) Date: Thu Feb 27 12:52:01 2003 Subject: [Spambayes] PickleRPC + hammiesrv/hammiecli Message-ID: <15966.20534.829789.858689@montanaro.dyndns.org> In an attempt to make the hammiesrv/hammiecli pair run faster, I wrote a simple PickleRPC module which works more-or-less like a combination of xmlrpclib and SimpleXMLRPCServer but uses raw sockets to connect (no HTTP) and uses an ASCII byte count followed by a pickle as the serialization format. Using this combination, running ten messages through the scoring roundtrip consumes 0.289s/msg of user+sys time on my Mac (including both client and server times) and 0.464s/msg of wallclock time (just measured on the client). This compares favorably with these numbers for hammiefilter: 0.54s/msg wallclock and 0.483s/msg user+sys. If you'd like to play around with the combination or see how simple an RPC protocol can be, the three files are at http://manatee.mojam.com/~skip/python/PickleRPC.py http://manatee.mojam.com/~skip/python/hammiesrv.py http://manatee.mojam.com/~skip/python/hammiecli.py Skip From popiel at wolfskeep.com Thu Feb 27 16:07:00 2003 From: popiel at wolfskeep.com (T. Alexander Popiel) Date: Thu Feb 27 19:07:04 2003 Subject: [Spambayes] Testing incremental training Message-ID: <20030228000700.AD9662DE8B@cashew.wolfskeep.com> Okay, I just checked in a bunch of stuff for testing incremental training. It's not complete (there's a TODO file listing a bunch of stuff I would like to have time to do on it), but it's there. I'll post some results graphs when I get a bit more coherent. The niftiest thing about what I wrote is that it gets the training regime to use out of a command-line specified file... so it's really easy to define and test multiple training regimes, and see which one does best. Now, for some sleep... - Alex From bill at parducci.net Thu Feb 27 17:26:27 2003 From: bill at parducci.net (bill parducci) Date: Thu Feb 27 20:26:31 2003 Subject: [Spambayes] spambayes question Message-ID: <3E5EBAC3.3030705@parducci.net> i was trying to setup spambayes-1.0a2 and have been unable to manually perform the training: # ./hammiefilter.py -n Created new database in /home/bill/.hammiedb # ./mboxtrain.py -d /var/spool/mail/bill -s /home/bill/mail/spam Traceback (most recent call last): File "./mboxtrain.py", line 278, in ? main() File "./mboxtrain.py", line 261, in main h = hammie.open(pck, usedb, "c") File "./spambayes/hammie.py", line 260, in open b = storage.DBDictClassifier(filename, mode) File "./spambayes/storage.py", line 140, in __init__ self.load() File "./spambayes/storage.py", line 148, in load self.dbm = dbmstorage.open(self.db_name, self.mode) File "./spambayes/dbmstorage.py", line 54, in open return f(*args) File "./spambayes/dbmstorage.py", line 36, in open_best return f(*args) File "./spambayes/dbmstorage.py", line 17, in open_dbhash return bsddb.hashopen(*args) at first blush, it looks like a database communication/authorization issue. i am running: python-2.2.2-5 python mail version 2.4.3 on linux (rh8) is this version of the code in a state whereby it can installed using the instructions in HAMMIE.txt? is there something else that i should be looking at? thanks b From skip at pobox.com Thu Feb 27 19:31:39 2003 From: skip at pobox.com (Skip Montanaro) Date: Thu Feb 27 20:32:28 2003 Subject: [Spambayes] spambayes question In-Reply-To: <3E5EBAC3.3030705@parducci.net> References: <3E5EBAC3.3030705@parducci.net> Message-ID: <15966.48123.90227.395894@montanaro.dyndns.org> bill> # ./mboxtrain.py -d /var/spool/mail/bill -s /home/bill/mail/spam bill> Traceback (most recent call last): bill> File "./mboxtrain.py", line 278, in ? bill> main() bill> File "./mboxtrain.py", line 261, in main bill> h = hammie.open(pck, usedb, "c") bill> File "./spambayes/hammie.py", line 260, in open bill> b = storage.DBDictClassifier(filename, mode) bill> File "./spambayes/storage.py", line 140, in __init__ bill> self.load() bill> File "./spambayes/storage.py", line 148, in load bill> self.dbm = dbmstorage.open(self.db_name, self.mode) bill> File "./spambayes/dbmstorage.py", line 54, in open bill> return f(*args) bill> File "./spambayes/dbmstorage.py", line 36, in open_best bill> return f(*args) bill> File "./spambayes/dbmstorage.py", line 17, in open_dbhash bill> return bsddb.hashopen(*args) What's the actual except and error message? It's clear there's a problem opening the database, but there should be an indication of some sort what the problem was. What happens if you delete the file hammiefilter created and just let mboxtrain create the db file? Skip From bill at parducci.net Thu Feb 27 17:40:36 2003 From: bill at parducci.net (bill parducci) Date: Thu Feb 27 20:40:39 2003 Subject: [Spambayes] spambayes question In-Reply-To: <15966.48123.90227.395894@montanaro.dyndns.org> References: <3E5EBAC3.3030705@parducci.net> <15966.48123.90227.395894@montanaro.dyndns.org> Message-ID: <3E5EBE14.40805@parducci.net> oops, to the list this time... b Skip Montanaro wrote: [...] > What's the actual except and error message? It's clear there's a problem > opening the database, but there should be an indication of some sort what > the problem was. > > What happens if you delete the file hammiefilter created and just let > mboxtrain create the db file? $ rm .hammiedb $ spambayes/mboxtrain.py -d /var/spool/mail/bill -s /home/bill/mail/spam Traceback (most recent call last): File "spambayes/mboxtrain.py", line 278, in ? main() File "spambayes/mboxtrain.py", line 261, in main h = hammie.open(pck, usedb, "c") File "./spambayes/hammie.py", line 260, in open spambayes.hammiebulk.main() File "./spambayes/storage.py", line 140, in __init__ File "./spambayes/storage.py", line 148, in load File "./spambayes/dbmstorage.py", line 54, in open File "./spambayes/dbmstorage.py", line 36, in open_best File "./spambayes/dbmstorage.py", line 17, in open_dbhash bsddb.error: (22, 'Invalid argument') From T.A.Meyer at massey.ac.nz Fri Feb 28 15:02:51 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Thu Feb 27 21:03:31 2003 Subject: [Spambayes] Storing Options [was Outlook feature request] Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1318D54B@its-xchg4.massey.ac.nz> [Tony] > I guess this is where I should abstract my Outlook ini writer to. > I'll get to this at some point today or tomorrow. Whew. Multiple config files certainly makes updating tricky :) It is nearly done, though - just lots and lots of testing remains, and then pop3proxy can use it (AFAIK, none of the other systems need to update the config files, although Outlook probably will at some point). [Skip] > Please consider submitting it for consideration to the standard Python > library. Well, they're welcome to it if they want it. How does one do this, exactly? =Tony Meyer From skip at pobox.com Thu Feb 27 21:02:05 2003 From: skip at pobox.com (Skip Montanaro) Date: Thu Feb 27 22:02:18 2003 Subject: [Spambayes] Storing Options [was Outlook feature request] In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F1318D54B@its-xchg4.massey.ac.nz> References: <1ED4ECF91CDED24C8D012BCF2B034F1318D54B@its-xchg4.massey.ac.nz> Message-ID: <15966.53549.53704.24244@montanaro.dyndns.org> Tony> [Skip] >> Please consider submitting it for consideration to the standard >> Python library. Tony> Well, they're welcome to it if they want it. How does one do Tony> this, exactly? Probably the best way is to submit a patch on Sourceforge with the module as an attachment, then post a brief note to python-dev@python.org letting folks there know it's available. In one place or the other you should describe that it's a complement to ConfigParser.py. If you have test cases or documentation, all the better. ;-) Skip From skip at pobox.com Thu Feb 27 21:14:21 2003 From: skip at pobox.com (Skip Montanaro) Date: Thu Feb 27 22:14:28 2003 Subject: [Spambayes] spambayes question In-Reply-To: <3E5EBE14.40805@parducci.net> References: <3E5EBAC3.3030705@parducci.net> <15966.48123.90227.395894@montanaro.dyndns.org> <3E5EBE14.40805@parducci.net> Message-ID: <15966.54285.131354.912313@montanaro.dyndns.org> bill> $ rm .hammiedb bill> $ spambayes/mboxtrain.py -d /var/spool/mail/bill -s /home/bill/mail/spam bill> Traceback (most recent call last): bill> File "spambayes/mboxtrain.py", line 278, in ? bill> main() bill> File "spambayes/mboxtrain.py", line 261, in main bill> h = hammie.open(pck, usedb, "c") bill> File "./spambayes/hammie.py", line 260, in open bill> spambayes.hammiebulk.main() bill> File "./spambayes/storage.py", line 140, in __init__ bill> File "./spambayes/storage.py", line 148, in load bill> File "./spambayes/dbmstorage.py", line 54, in open bill> File "./spambayes/dbmstorage.py", line 36, in open_best bill> File "./spambayes/dbmstorage.py", line 17, in open_dbhash bill> bsddb.error: (22, 'Invalid argument') Something tells me it wasn't trying to open .hammiedb. You generally get an 'invalid argument' error when the file being opened already exists and was created with something other than the relevant bsddb call. In dbmstorage.py, try changing line 17 from return bsddb.hashopen(*args) to try: return bsddb.hashopen(*args) except bsddb.error: print >> sys.stderr, args raise It will still barf, but tell you exactly what file is being opened. See if it already exists. If so, delete it and try again. If not, let us know. Skip From bill at parducci.net Thu Feb 27 20:48:41 2003 From: bill at parducci.net (bill parducci) Date: Thu Feb 27 23:48:45 2003 Subject: [Spambayes] spambayes question In-Reply-To: <15966.54285.131354.912313@montanaro.dyndns.org> References: <3E5EBAC3.3030705@parducci.net> <15966.48123.90227.395894@montanaro.dyndns.org> <3E5EBE14.40805@parducci.net> <15966.54285.131354.912313@montanaro.dyndns.org> Message-ID: <3E5EEA29.3020502@parducci.net> Skip Montanaro wrote: [...] > Something tells me it wasn't trying to open .hammiedb. You generally get an > 'invalid argument' error when the file being opened already exists and was > created with something other than the relevant bsddb call. > > In dbmstorage.py, try changing line 17 from > > return bsddb.hashopen(*args) > > to > > try: > return bsddb.hashopen(*args) > except bsddb.error: > print >> sys.stderr, args > raise > > It will still barf, but tell you exactly what file is being opened. See if > it already exists. If so, delete it and try again. If not, let us know. $ spambayes/mboxtrain.py -d /var/spool/mail/bill -s /home/bill/mail/spam Traceback (most recent call last): File "spambayes/mboxtrain.py", line 38, in ? from spambayes import hammie, mboxutils File "./spambayes/hammie.py", line 5, in ? spambayes.hammiebulk.main() File "./spambayes/storage.py", line 62, in ? File "spambayes/spambayes/dbmstorage.py", line 17 try: ^ IndentationError: unindent does not match any outer indentation level syntax error? (dbmstorage.py) def open_dbhash(*args): """Open a bsddb hash. Don't use this on Windows.""" import bsddb try: return bsddb.hashopen(*args) except bsddb.error: print >> sys.stderr, args raise sorry, i am a python noob so unfamiliar with code structure. b From bill at parducci.net Thu Feb 27 20:52:49 2003 From: bill at parducci.net (bill parducci) Date: Thu Feb 27 23:52:53 2003 Subject: [Spambayes] spambayes question In-Reply-To: <3E5ED5C8.4060102@pfrog.com> References: <3E5EBAC3.3030705@parducci.net> <3E5ED5C8.4060102@pfrog.com> Message-ID: <3E5EEB21.3050406@parducci.net> ah, you are correct. complete blow it on may part. i misread the desstructions. it seems to be working now. sorry for the false alarm. (syntax seemed odd to me at the time, too). ...thank you and i gladly accept this week's dork of the week award for going above and beyond they call of dummy... :o) b Jeff de Vries wrote: > Shouldn't that be: > ./mboxtrain.py -d /home/bill/.hammiedb -g /var/spool/mail/bill -s > /home/bill/mail/spam > > (in other words you're telling mboxtrain that your mailbox is the > database!) From noreply at sourceforge.net Thu Feb 27 20:29:54 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Fri Feb 28 00:21:36 2003 Subject: [Spambayes] [ spambayes-Bugs-693423 ] email message generates error in pop3proxy.py Message-ID: Bugs item #693423, was opened at 2003-02-25 23:02 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=693423&group_id=61702 Category: pop3proxy Group: None Status: Open Resolution: None Priority: 5 Submitted By: David Shaw (dshaw) Assigned to: Tim Stone (timstone4) Summary: email message generates error in pop3proxy.py Initial Comment: Hi all, A friend of mine had a cache file in his "unknown" folder that caused the "review" web page in pop3proxy.py to generate the following traceback: Traceback (most recent call last): File "spambayes/Dibbler.py", line 398, in found_terminator getattr(plugin, name)(**params) File "pop3proxy.py", line 929, in onReview judgement = judgement.split(';')[0].strip() File "pop3proxy.py", line 815, in _makeMessageInfo print type(text) AttributeError: 'list' object has no attribute 'replace' He sent me the offending message, and I replicated the problem: msg = open("/Users/dshaw/Desktop/crash_spam.txt", "r") message = mbox.get_message(msg) part = typed_subpart_iterator(message, 'text', 'plain').next() text = part.get_payload() >>> text [] So, instead of text, the payload is a list containing a single email message instance. Here are the objects' respective payloads: >>> message._payload [, , , , , , , , , , , , , ] ---------------------------------------------------------------------- >Comment By: Tim Stone (timstone4) Date: 2003-02-27 22:29 Message: Logged In: YES user_id=645698 I just checked in a fix for this problem. I have no ability to actually test it, though. Please try your test case again and let me know the outcome. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=693423&group_id=61702 From tim_one at email.msn.com Fri Feb 28 00:41:58 2003 From: tim_one at email.msn.com (Tim Peters) Date: Fri Feb 28 00:42:49 2003 Subject: [Spambayes] OT: FYI: Microsoft virus email In-Reply-To: Message-ID: [David LeBlanc] > I don't think this is really from Microsoft! No, it sure wasn't. My copy rated Unsure, and it was extremely well done! > ... > With friendly greetings, > Microsoft Internet Security Section Real mail from MS never ends with friendly greetings <0.3 wink>. > ... > I didn't think anyone would like the attachment ;-) Actually, I would -- stinkin' Outlook wouldn't let me get at the .exe attachment, and I didn't care enough to crack it. I have software that captures disk writes at the physical block level and can undo them, so sometimes I run a virus just to see what it does. Maybe this one really did package security updates . From noreply at sourceforge.net Fri Feb 28 05:54:42 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Fri Feb 28 11:28:37 2003 Subject: [Spambayes] [ spambayes-Feature Requests-695059 ] wildcard support for mboxtrain Message-ID: Feature Requests item #695059, was opened at 2003-02-28 05:54 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498106&aid=695059&group_id=61702 Category: None Group: None Status: Open Priority: 5 Submitted By: bill parducci (humantypo) Assigned to: Nobody/Anonymous (nobody) Summary: wildcard support for mboxtrain Initial Comment: i have about 40 folders that i use to keep track of numerous e-mail lists, projects, scraps of digital dimentia, etc. it would be very helpful if mboxtrain would accept wildcards for mail folder identification. yes, i could have 40 command line params, but that adds a YAM (Yet Another Maintenance) task to make sure that the folders match the command line parameters. what would really be useful is if mboxtrain would keep track of folders that it has read in that session already. that way one could use the following syntax: mboxtrain -d [db] -s [dir]/spam -g [dir]/* and not have the ham process read the spam folder (since it is likely that there will be only 1 spam folder and multiple ham folders). i suppose you could just hard code the ham flag parser to ignore folders named 'spam' but that would kinda be horky... anway, i think would help in the move towards more 'set & forget' operation. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498106&aid=695059&group_id=61702 From noreply at sourceforge.net Fri Feb 28 08:34:30 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Fri Feb 28 11:28:45 2003 Subject: [Spambayes] [ spambayes-Bugs-693423 ] email message generates error in pop3proxy.py Message-ID: Bugs item #693423, was opened at 2003-02-26 00:02 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=693423&group_id=61702 Category: pop3proxy Group: None Status: Open Resolution: None Priority: 5 Submitted By: David Shaw (dshaw) Assigned to: Tim Stone (timstone4) Summary: email message generates error in pop3proxy.py Initial Comment: Hi all, A friend of mine had a cache file in his "unknown" folder that caused the "review" web page in pop3proxy.py to generate the following traceback: Traceback (most recent call last): File "spambayes/Dibbler.py", line 398, in found_terminator getattr(plugin, name)(**params) File "pop3proxy.py", line 929, in onReview judgement = judgement.split(';')[0].strip() File "pop3proxy.py", line 815, in _makeMessageInfo print type(text) AttributeError: 'list' object has no attribute 'replace' He sent me the offending message, and I replicated the problem: msg = open("/Users/dshaw/Desktop/crash_spam.txt", "r") message = mbox.get_message(msg) part = typed_subpart_iterator(message, 'text', 'plain').next() text = part.get_payload() >>> text [] So, instead of text, the payload is a list containing a single email message instance. Here are the objects' respective payloads: >>> message._payload [, , , , , , , , , , , , , ] ---------------------------------------------------------------------- >Comment By: David Shaw (dshaw) Date: 2003-02-28 11:34 Message: Logged In: YES user_id=244639 Seems to be fixed! Thanks. ---------------------------------------------------------------------- Comment By: Tim Stone (timstone4) Date: 2003-02-27 23:29 Message: Logged In: YES user_id=645698 I just checked in a fix for this problem. I have no ability to actually test it, though. Please try your test case again and let me know the outcome. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=693423&group_id=61702 From noreply at sourceforge.net Fri Feb 28 08:39:56 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Fri Feb 28 11:35:54 2003 Subject: [Spambayes] [ spambayes-Bugs-693423 ] email message generates error in pop3proxy.py Message-ID: Bugs item #693423, was opened at 2003-02-25 23:02 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=693423&group_id=61702 Category: pop3proxy Group: None >Status: Closed Resolution: None Priority: 5 Submitted By: David Shaw (dshaw) Assigned to: Tim Stone (timstone4) Summary: email message generates error in pop3proxy.py Initial Comment: Hi all, A friend of mine had a cache file in his "unknown" folder that caused the "review" web page in pop3proxy.py to generate the following traceback: Traceback (most recent call last): File "spambayes/Dibbler.py", line 398, in found_terminator getattr(plugin, name)(**params) File "pop3proxy.py", line 929, in onReview judgement = judgement.split(';')[0].strip() File "pop3proxy.py", line 815, in _makeMessageInfo print type(text) AttributeError: 'list' object has no attribute 'replace' He sent me the offending message, and I replicated the problem: msg = open("/Users/dshaw/Desktop/crash_spam.txt", "r") message = mbox.get_message(msg) part = typed_subpart_iterator(message, 'text', 'plain').next() text = part.get_payload() >>> text [] So, instead of text, the payload is a list containing a single email message instance. Here are the objects' respective payloads: >>> message._payload [, , , , , , , , , , , , , ] ---------------------------------------------------------------------- Comment By: David Shaw (dshaw) Date: 2003-02-28 10:34 Message: Logged In: YES user_id=244639 Seems to be fixed! Thanks. ---------------------------------------------------------------------- Comment By: Tim Stone (timstone4) Date: 2003-02-27 22:29 Message: Logged In: YES user_id=645698 I just checked in a fix for this problem. I have no ability to actually test it, though. Please try your test case again and let me know the outcome. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=693423&group_id=61702 From noreply at sourceforge.net Fri Feb 28 08:40:27 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Fri Feb 28 11:36:03 2003 Subject: [Spambayes] [ spambayes-Bugs-695142 ] Email does not render subject in the "Review" Page Message-ID: Bugs item #695142, was opened at 2003-02-28 11:40 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=695142&group_id=61702 Category: None Group: None Status: Open Resolution: None Priority: 5 Submitted By: David Shaw (dshaw) Assigned to: Nobody/Anonymous (nobody) Summary: Email does not render subject in the "Review" Page Initial Comment: I received the attached email. When I go to the "review" web page of pop3proxy.py, all it shows is: Messages classified as Unsure: From: (none) (none) It acts as though the message has no "from" or "subject", even though they exist. The user is not given any way to classify this message other than to click on the first "(none)" and read the raw message to determine its contents. I will attach the message below. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=695142&group_id=61702 From noreply at sourceforge.net Fri Feb 28 08:42:29 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Fri Feb 28 11:36:11 2003 Subject: [Spambayes] [ spambayes-Bugs-695142 ] Email does not render subject in the "Review" Page Message-ID: Bugs item #695142, was opened at 2003-02-28 11:40 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=695142&group_id=61702 Category: None Group: None Status: Open Resolution: None Priority: 5 Submitted By: David Shaw (dshaw) Assigned to: Nobody/Anonymous (nobody) >Summary: Email does not render subject in the "Review" Page Initial Comment: I received the attached email. When I go to the "review" web page of pop3proxy.py, all it shows is: Messages classified as Unsure: From: (none) (none) It acts as though the message has no "from" or "subject", even though they exist. The user is not given any way to classify this message other than to click on the first "(none)" and read the raw message to determine its contents. I will attach the message below. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=695142&group_id=61702 From noreply at sourceforge.net Fri Feb 28 09:45:22 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Fri Feb 28 12:37:05 2003 Subject: [Spambayes] [ spambayes-Bugs-695187 ] mboxtrain.py and dbExpImp.py not Python 2.2 compatible Message-ID: Bugs item #695187, was opened at 2003-02-28 12:45 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=695187&group_id=61702 Category: None Group: None Status: Open Resolution: None Priority: 5 Submitted By: David Shaw (dshaw) Assigned to: Nobody/Anonymous (nobody) Summary: mboxtrain.py and dbExpImp.py not Python 2.2 compatible Initial Comment: Both mboxtrain.py and dbExpImp.py lack the following compatibility statements that let them work in Python 2.2. Other files seem to have them, so these two should as well. try: True, False except NameError: # Maintain compatibility with Python 2.2 True, False = 1, 0 ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=695187&group_id=61702 From noreply at sourceforge.net Fri Feb 28 10:12:56 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Fri Feb 28 13:14:45 2003 Subject: [Spambayes] [ spambayes-Bugs-695187 ] mboxtrain.py and dbExpImp.py not Python 2.2 compatible Message-ID: Bugs item #695187, was opened at 2003-02-28 11:45 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=695187&group_id=61702 Category: None Group: None Status: Open Resolution: None Priority: 5 Submitted By: David Shaw (dshaw) >Assigned to: Tim Stone (timstone4) Summary: mboxtrain.py and dbExpImp.py not Python 2.2 compatible Initial Comment: Both mboxtrain.py and dbExpImp.py lack the following compatibility statements that let them work in Python 2.2. Other files seem to have them, so these two should as well. try: True, False except NameError: # Maintain compatibility with Python 2.2 True, False = 1, 0 ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=695187&group_id=61702 From noreply at sourceforge.net Fri Feb 28 10:17:28 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Fri Feb 28 13:14:53 2003 Subject: [Spambayes] [ spambayes-Bugs-695187 ] mboxtrain.py and dbExpImp.py not Python 2.2 compatible Message-ID: Bugs item #695187, was opened at 2003-02-28 11:45 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=695187&group_id=61702 Category: None Group: None >Status: Closed Resolution: None Priority: 5 Submitted By: David Shaw (dshaw) Assigned to: Tim Stone (timstone4) Summary: mboxtrain.py and dbExpImp.py not Python 2.2 compatible Initial Comment: Both mboxtrain.py and dbExpImp.py lack the following compatibility statements that let them work in Python 2.2. Other files seem to have them, so these two should as well. try: True, False except NameError: # Maintain compatibility with Python 2.2 True, False = 1, 0 ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=695187&group_id=61702 From noreply at sourceforge.net Fri Feb 28 10:21:00 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Fri Feb 28 13:15:02 2003 Subject: [Spambayes] [ spambayes-Bugs-695142 ] Email does not render subject in the "Review" Page Message-ID: Bugs item #695142, was opened at 2003-02-28 10:40 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=695142&group_id=61702 Category: None Group: None Status: Open Resolution: None Priority: 5 Submitted By: David Shaw (dshaw) >Assigned to: Tim Stone (timstone4) >Summary: Email does not render subject in the "Review" Page Initial Comment: I received the attached email. When I go to the "review" web page of pop3proxy.py, all it shows is: Messages classified as Unsure: From: (none) (none) It acts as though the message has no "from" or "subject", even though they exist. The user is not given any way to classify this message other than to click on the first "(none)" and read the raw message to determine its contents. I will attach the message below. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=695142&group_id=61702 From bkc at murkworks.com Fri Feb 28 15:01:52 2003 From: bkc at murkworks.com (Brad Clements) Date: Fri Feb 28 14:49:57 2003 Subject: [Spambayes] OT: FYI: Microsoft virus email In-Reply-To: References: Message-ID: <3E5F76DA.31286.7B2BAA55@localhost> On 28 Feb 2003 at 0:41, Tim Peters wrote: > > I didn't think anyone would like the attachment ;-) > > Actually, I would -- stinkin' Outlook wouldn't let me get at the .exe > attachment, and I didn't care enough to crack it. I have software that > captures disk writes at the physical block level and can undo them, so > sometimes I run a virus just to see what it does. Maybe this one really > did package security updates . > "although I wear a bullet proof vest, I sometimes fire a gun into my chest just to see if it works." -- Brad Clements, bkc@murkworks.com (315)268-1000 http://www.murkworks.com (315)268-9812 Fax http://www.wecanstopspam.org/ AOL-IM: BKClements From Todd_Cranston-Cuebas at citysearch.com Fri Feb 28 16:19:13 2003 From: Todd_Cranston-Cuebas at citysearch.com (Todd Cranston-Cuebas) Date: Fri Feb 28 19:35:01 2003 Subject: [Spambayes] New Application of SpamBayesian tech? Message-ID: <71D28C8451BFD5119B2B00508BE26E6402F473CA@pasmail3.office.tmcs> Derek, I'm the person you quoted in your posting so I'm very intrigued by SpamBayesian tech? Can you explain to me what this is in layman's terms? Todd todd cranston-cuebas -- senior technical recruiter tcc@ticketmaster.com new address and phone effective 2/10/03: ticketmaster (NASDAQ: USAi) 8800 sunset blvd . west hollywood, ca . 90069 voice 310.360.2436 . main 310.360.3300 ------------------------------------------------------- Derek Simkowiak dereks at itsite.com Tue Dec 3 18:53:09 2002 Previous message: [Spambayes] dbm on windows, hopefully for the last time Next message: [Spambayes] New Application of SpamBayesian tech? Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] ---------------------------------------------------------------------------- ---- Surfing Slashdot, ran accrossed the interview at http://www.theopenenterprise.com/story/TOE20021202S0001 which is about finding jobs. I saw this part: ------------------------------------------------------- [The interviewee mentions they got 3000 resumes in a single weekend...] [Interviewer] TheOpenEnterprise: How do you handle 3000 resumes? Do you look at them all? [Interviewee] Cranston-Cuebas: In a sense, we do. But we first scan them quickly to filter out applicants without relevant skills. We create an index of all incoming resumes and search on keywords. That's why it's important for job-seekers to repeat the major skills multiple times in their resume. Another reason is that some recruiters use applicant tracking programs that do automatic skills assessment based on keywords found in the resume, and will rank resumes based on that assessment. ------------------------------------------------------- Is anyone else seeing what I'm seeing? It seems like the SpamBayes algorithms are perfectly suited to this task... and would be far more accurate than whatever simple "keyword" tracking the current apps use. For some reason, the application of "filtering in" with SpamBayes (instead of "filtering out") never occurred to me before. Given the large number of people looking for jobs in the U.S., this seems like a good opportunity. Anyone else find this interesting? From skip at pobox.com Fri Feb 28 20:08:14 2003 From: skip at pobox.com (Skip Montanaro) Date: Fri Feb 28 21:08:21 2003 Subject: [Spambayes] New Application of SpamBayesian tech? In-Reply-To: <71D28C8451BFD5119B2B00508BE26E6402F473CA@pasmail3.office.tmcs> References: <71D28C8451BFD5119B2B00508BE26E6402F473CA@pasmail3.office.tmcs> Message-ID: <15968.5646.731944.704216@montanaro.dyndns.org> Todd> I'm the person you quoted in your posting so I'm very intrigued by Todd> SpamBayesian tech? Can you explain to me what this is in layman's Todd> terms? Essentially, given a pile of "appropriate" and "not appropriate" documents (spam vs. non-spam email messages so far, but it could be resumes or appropriate vs. not appropriate web pages), the system is trained using them. The system tokenizes the document (not always in a completely straightforward fashion) and counts how often various tokens occur in the spam vs. non-spam documents. Training complete, unknown documents are fed to the system and it classifies them based upon the relative "spamminess" or "non-spamminess" of tokens in the document. There's a much better explanation on the SpamBayes website: http://spambayes.sf.net/ Skip From popiel at wolfskeep.com Fri Feb 28 21:15:52 2003 From: popiel at wolfskeep.com (T. Alexander Popiel) Date: Sat Mar 1 00:15:59 2003 Subject: [Spambayes] Graph results Message-ID: <20030301051552.4DB592DE8C@cashew.wolfskeep.com> Well, I tried to post to the list with the first set of graphs I made, but that's waiting on moderator approval (likely because of the several .png files attached). The text portion of the post follows. - Alex -------- This is the first set of really interesting graphs I've made on the whole training regime thing. These graphs show average error rates (fp, fn, and unsure) over the prior 7 days of data for any given point on the graph. This is based on my actual mail feed for the last 6 months, with nothing left out... including our discussions of spam (with quoted examples) on this list, which are a wonderful source of false positives. I divided my mail into 24-hour groups for report buckets, and I also divided all the mail into 5 sets, and did 5 runs of each training regime each with 4 of the 5 sets. These multiple runs are all plotted on the graphs, which is why there's a multiplicity of lines in each color. (Doing multiple runs like this points out which things are flukes, much like a cross-validation, although the mechanics here are slightly different.) I have a few associated observations: 1. No matter how you train, spambayes gets very good very quickly... on the order of days to error rates < 5%. 2. Spambayes continues to improve for a couple months, but I'm starting to see an increase in errors after about 4-5 months. I don't know why this is; it might be because spam is mutating, or it might be because my definition of spam has been mutating. 3. If you do perfect training as soon as messages arrive, you still get the occasional false positive and a fair amount of unsures. 4. Training immediately based on the classifier output and making corrections to perfect at the end of the day is only marginally worse than immediately perfect training. 5. Training only on fp, fn, and unsures doesn't change the fp much, but is significantly worse (double or triple, or 1 to 4%) on fn and unsures. 6. Training only on fp, fn, and unsures only trained on approximately 90 ham and 1300 spam (compared to 8200 ham and 15300 spam for perfect training). Doing these graphs was fun, in a nit-picky sort of way. One could spend weeks fiddling and coming up with more data to make pretty pictures with. I will probably spend some more time building a few more training regimes (and posting this on my website), but the moral of the story is pretty obvious: spambayes is very good, and if you're willing to have slightly higher error (and unsure) rates, then the amount of training can be cut drastically. Anyway, the next thing for me to really look at is the effect of aging... - Alex From popiel at wolfskeep.com Fri Feb 28 21:09:40 2003 From: popiel at wolfskeep.com (T. Alexander Popiel) Date: Sat Mar 1 02:01:21 2003 Subject: [Spambayes] Some initial graphs Message-ID: <20030301050940.A1F942DE8C@cashew.wolfskeep.com> This is the first set of really interesting graphs I've made on the whole training regime thing. These graphs show average error rates (fp, fn, and unsure) over the prior 7 days of data for any given point on the graph. This is based on my actual mail feed for the last 6 months, with nothing left out... including our discussions of spam (with quoted examples) on this list, which are a wonderful source of false positives. I divided my mail into 24-hour groups for report buckets, and I also divided all the mail into 5 sets, and did 5 runs of each training regime each with 4 of the 5 sets. These multiple runs are all plotted on the graphs, which is why there's a multiplicity of lines in each color. (Doing multiple runs like this points out which things are flukes, much like a cross-validation, although the mechanics here are slightly different.) I have a few associated observations: 1. No matter how you train, spambayes gets very good very quickly... on the order of days to error rates < 5%. 2. Spambayes continues to improve for a couple months, but I'm starting to see an increase in errors after about 4-5 months. I don't know why this is; it might be because spam is mutating, or it might be because my definition of spam has been mutating. 3. If you do perfect training as soon as messages arrive, you still get the occasional false positive and a fair amount of unsures. 4. Training immediately based on the classifier output and making corrections to perfect at the end of the day is only marginally worse than immediately perfect training. 5. Training only on fp, fn, and unsures doesn't change the fp much, but is significantly worse (double or triple, or 1 to 4%) on fn and unsures. 6. Training only on fp, fn, and unsures only trained on approximately 90 ham and 1300 spam (compared to 8200 ham and 15300 spam for perfect training). Doing these graphs was fun, in a nit-picky sort of way. One could spend weeks fiddling and coming up with more data to make pretty pictures with. I will probably spend some more time building a few more training regimes (and posting this on my website), but the moral of the story is pretty obvious: spambayes is very good, and if you're willing to have slightly higher error (and unsure) rates, then the amount of training can be cut drastically. Anyway, the next thing for me to really look at is the effect of aging... - Alex -------------- next part -------------- A non-text attachment was scrubbed... Name: corrected*.mtv.png Type: image/png Size: 13759 bytes Desc: not available Url : http://mail.python.org/pipermail/spambayes/attachments/20030228/e5ec8efb/corrected.mtv-0001.png -------------- next part -------------- A non-text attachment was scrubbed... Name: fpfnunsure*.mtv.png Type: image/png Size: 15948 bytes Desc: not available Url : http://mail.python.org/pipermail/spambayes/attachments/20030228/e5ec8efb/fpfnunsure.mtv-0001.png -------------- next part -------------- A non-text attachment was scrubbed... Name: perfect*.mtv.png Type: image/png Size: 13528 bytes Desc: not available Url : http://mail.python.org/pipermail/spambayes/attachments/20030228/e5ec8efb/perfect.mtv-0001.png -------------- next part -------------- A non-text attachment was scrubbed... Name: mixed_fn.mtv.png Type: image/png Size: 13506 bytes Desc: not available Url : http://mail.python.org/pipermail/spambayes/attachments/20030228/e5ec8efb/mixed_fn.mtv-0001.png -------------- next part -------------- A non-text attachment was scrubbed... Name: mixed_fp.mtv.png Type: image/png Size: 7422 bytes Desc: not available Url : http://mail.python.org/pipermail/spambayes/attachments/20030228/e5ec8efb/mixed_fp.mtv-0001.png -------------- next part -------------- A non-text attachment was scrubbed... Name: mixed_unsure.mtv.png Type: image/png Size: 16956 bytes Desc: not available Url : http://mail.python.org/pipermail/spambayes/attachments/20030228/e5ec8efb/mixed_unsure.mtv-0001.png From clopes at yahoo.com Fri Feb 28 22:51:59 2003 From: clopes at yahoo.com (Chris Lopes) Date: Sat Mar 1 02:01:28 2003 Subject: [Spambayes] "delete as spam" gives error in Outlook XP Message-ID: <20030301065159.47330.qmail@web41305.mail.yahoo.com> Hello, I am running Outlook 2002 SP-2 on Windows XP Pro SP1. I have spambayes 1.0a2 installed, along with python.org's python 2.2.2 with win32all-150 installed. In order to install the add-in for outlook, I just ran addin.py from spambayes' outlook2000 directory. The plugin installed fine, and I was able to train spambayes on a set of both spam and non-spam emails just fine. However, "Delete As Spam" does not work. It gives the following error visible from PythonWin's Trace Collector Debugging Tool when I click "Delete As Spam": pythoncom error: Python error invoking COM method. Traceback (most recent call last): File "D:\Python22\lib\site-packages\win32com\server\policy.py", line 275, in _Invoke_ return self._invoke_(dispid, lcid, wFlags, args) File "D:\Python22\lib\site-packages\win32com\server\policy.py", line 280, in _invoke_ return S_OK, -1, self._invokeex_(dispid, lcid, wFlags, args, None, None) File "D:\Python22\lib\site-packages\win32com\server\policy.py", line 510, in _invokeex_ return apply(func, args) File "D:\spambayes-1.0a2\Outlook2000\addin.py", line 305, in OnClick spam_folder = msgstore.GetFolder(spam_folder_id) File "D:\spambayes-1.0a2\Outlook2000\msgstore.py", line 223, in GetFolder folder_id = self.NormalizeID(folder_id) File "D:\spambayes-1.0a2\Outlook2000\msgstore.py", line 185, in NormalizeID assert type(item_id) in [type(''), type(u'')], "What kind of ID is '%r'?" % (item_id,) exceptions.AssertionError: What kind of ID is 'None'? Please help __________________________________________________ Do you Yahoo!? Yahoo! Tax Center - forms, calculators, tips, more http://taxes.yahoo.com/